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ABSTRACT 

We present a comprehensive statistical analysis of the properties of Type la SN light curves in 
the near infrared using recent data from PAIRITEL and the literature. We construct a hierarchical 
Bayesian framework, incorporating several uncertainties including photometric error, peculiar veloci- 
ties, dust extinction and intrinsic variations, for principled and coherent statistical inference. SN la 
light curve inferences are drawn from the global posterior probability of parameters describing both 
individual supernovae and the population conditioned on the entire SN la NIR dataset. The logical 
structure of the hierarchical model is represented by a directed acyclic graph. Fully Bayesian analysis 
of the model and data is enabled by an efficient MCMC algorithm exploiting the conditional proba- 
bilistic structure using Gibbs sampling. We apply this framework to the JHKg SN la light curve data. 
A new light curve model captures the observed J-band light curve shape variations. The marginal 
intrinsic variances in peak absolute magnitudes are: a{Mj) = 0.17 ± 0.03, <j{Mh) = 0.11 ± 0.03, 
and a{MKs) — 0.19 ± 0.04. We describe the first quantitative evidence for correlations between the 
NIR absolute magnitudes and J-band light curve shapes, and demonstrate their utility for distance 
estimation. The average residual in the Hubble diagram for the training set SN at cz > 2000 km s""'^ 
is 0.10 mag. The new application of bootstrap cross-validation to SN la light curve inference tests the 
sensitivity of the statistical model fit to the finite sample and estimates the prediction error at 0.15 
mag. These results demonstrate that SN la NIR light curves are as effective as corrected optical light 
curves, and, because they are less vulnerable to dust absorption, they have great potential as precise 
and accurate cosmological distance indicators. 
Subject headings: distance scale - supernovae: general 



> 
cn 
p 

00 

o 

O 



1. INTRODUCTION 

Type la supernova (SN la) rest-frame optical light 
curves have been of great utility for measuring of fun- 
damental quantities of the universe. As standardizable 
candles, the y were critical to the detection of cosmi c 
acceleration (jRiess et al.lll998t iPerlmutter et al.l Il999f ). 
The cosmic acceleration may be caused by a dark en- 
ergy component of the universe (|Frieman et al.ll2008l pro- 
vide a recent review). SN la have been used to con- 
strain the equation-of- state parameter w of dark energy 
(jCarnavich et a l. 1998), and recent eff orts have measured 
w to 10%, (TWood- Vascv ct al.. ,2007: lAstier et al.l 120061 : 
iKowalski et al.ll20b 8: Hickc n et al.ll2009bl ). SNTa have 
also been used to establish the extra galactic distance 
scale and measure the Hubble constant jlFreednian et al.l 
I2OOII IJha et al.l[l999l : IRiess et al.l[2005l I2009allbl 'l. 

The effectiveness of SN la as distance indicators has 
been improved greatly by the construction of empir- 
ical methods that exploit relationships between peak 
optical luminosities of SN la and distance-independent 
measures such as light curve shape or color that 
have been observed in the burgeoning sample of 
nearby low-z SN la (Ham uv et al.l IT996al: IRiess et al.l 
119991 : IJha et al.l [2006 : Hicken et al.l bo09af). Methods 
have included Amis fi?) (Phillips 1993: H amuv et al.l 
199&a IPhiU ips et al l 11999,'). MF CS (.Riess et ail 199631 
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CMAGIC fW ang et al.l [200l . and SALT l|Guv et all 
2005, 2007). The largest systematic uncertainty that lim- 
its the precision of rest-frame optical light curves is dust 
extinction in the host galaxy and the entanglement of 
dust reddening with the intrinsic color variations of SN 
(e.g.. lCCTde v et al.'l 2007D . 

Early obscrvations_jof SN l a in th e infrared were 
made byE irshncr ct al.' (l973): Ehas ct aL (1981, 1985); 
iFrogel et al . (1987) and Graham et al.i (,1988). Studies 
of nearby SN la light curves in the NIR have found the 
peak near-infrared luminosities of SN la have a disper- 
sion s maller than ±0.20 mas . jElias et all 119851 : IMeikld 
120001 : iKrisciimas et all L2004a.'c). Furthermore, the ef- 
fect of dust extinction is significantly diminished at near- 
infrared wavelengths, relative to the optical. The com- 
bination of optical and near-infrared observations of SN 
la light curves could le ad to even better SN la distances 
(iKrisciunas et al."2 00"7l)- 

Wood-Vascv ct al.l (|2008[ ) (hereafter WV08) compiled 
the largest homogeneous sample of NIR SN la observa- 
tions, taken with the Peters Automated InfraRe d Imag- 
ing TELescope (PAIRITEL: iBloom et alll2006f) . After 
combining these with NIR light curve observations from 
the literature to yield a sample of 41 NIR SN la, they 
constructed template light curves by interpolating and 
smoothing the data. They measured the scatter in the 
absolute magnitudes at time of B maximum in each of 
the J, H, and Kg bands and found a{MH) ~ 0.15 mag, 
ct(Mj) r:! 0.33 mag and a{MKj « 0.26 mag. This analy- 
sis did not take into account NIR light curve shape vari- 
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ations, but it was found, as in iKrisciunas et aTl ()2004af ). 
that the Hubble diagram residuals had no trend with the 
optical light curve width. 

The purpose of this paper is twofold. First, we formu- 
late the hierarchical Bayesian approach to probabilistic 
inference with SN la light curves in general. A proper 
Bayesian approach provides a principled, coherent frame- 
work for inference based on the joint probability density 
over all quantities of interest conditioned on the avail- 
able data. It is natural to employ a simultaneous multi- 
level approach and derive joint probability densities over 
the parameters of individual supernova light curves, their 
distance moduli, and also the variables that describe the 
population, including those governing the joint proba- 
bility distributions over multi-band absolute magnitudes 
and light curve shape parameters. This approach en- 
ables consistent statistical inference of all hierarchical 
parameters, by coherently incorporating several sources 
of uncertainty, including peculiar velocity uncertainties, 
photometric measurement errors, intrinsic randomness of 
light curves, and dust extinction into the global poste- 
rior probability density conditioned on the entire data set 
simultaneously. This framework leads to a natural and 
consistent method for probabilistic distance prediction 
with new SN la light curve data. The logical structure 
of our hierarchical model for SN la light curve inference 
is demonstrated by the equivalent directed acyclic graph 
(DAG), a graphical model that facilitates easy inspection 
of the probabilistic relationships. 

Although the probabilities for fully Bayesian analysis 
of SN la light curves are simple to write down, the joint 
posterior distribution is generally non-gaussian and dif- 
ficult to evaluate. To enable probabilistic inference, we 
have developed a Markov Chain Monte Carlo (MCMC) 
algorithm, BayeSN, designed to exploit the conditional 
probabilistic structure using Gibbs sampling. We em- 
ploy this code for both training the statistical model and 
using the model to predict distances. The use of ad- 
vanced sampling methods facilitates the computation of 
marginal probabilities of parameters from the global joint 
posterior density. 

In the second part of the paper, we apply this frame- 
work to the NIR SN la light curve data from the com- 
pilation of WV08. We first construct model template 
light curves for the JHKg bands. We compute fixed 
maximum likelihood template models between -10 and 
20 days for the H and Kg using all available data. The 
J-band data is typically much less noisy than the H and 
K, so we construct an extensive J-band model between 
-10 and 60 days that accounts for light curve variations, 
in particular the structure around the second maximum. 
Next, we apply the BayeSN method to simultaneously 
(1) fit the individual JHKg light curves, (2) compute the 
population characteristics, especially the absolute magni- 
tude variances and covariances with J-band light curve 
shape and (3) estimate the joint and marginal uncer- 
tainties over all hierarchical parameters. We construct a 
Hubble diagram for the training set SN la and compute 
its residual errors. 

The average Hubble diagram residual of the training 
set SN is an optimistic assessment of the predictive abil- 
ity of the statistical model for SN la light curves because 
it uses the SN data twice: first for estimating the model 
parameters (training), and second in evaluating the er- 



ror of its "predictions" . Hence, the residuals, or train- 
ing errors, underestimate the expected prediction error. 
This effect is present for all models based on finite train- 
ing data, and is particularly important for small sample 
sizes. We perform bootstrap cross- validation to realis- 
tically estimate the out-of-sample prediction error and 
to test the sensitivity to the finite NIR SN la sample. 
This technique ensures that the same SN are not simul- 
taneously used for training and prediction. It has not 
been used previously in SN la statistical modeling and 
inference. 

This paper demonstrates hierarchical Bayesian model- 
ing and distance estimation for SN la light curves in the 
NIR only. The application of these methods to combined 
optical and NIR light curves for the estimation of dust 
and distances will be described in a subsequent paper 
(Mandel et al. 2009, in prep.). 

This paper is organized as follows: In §2, we describe 
the hierarchical Bayesian framework for SN la light curve 
inference. The structure of the hierarchical model can be 
depicted formally as a directed acyclic graph presented 
in §2.3. In §2.4, we describe BayeSN, an MCMC algo- 
rithm designed for computing posterior inferences in the 
hierarchical framework. In §3, the construction of tem- 
plate light curve models in JHKg is described. In §4, 
we apply this approach to the NIR light curve data, and 
summarize the posterior inferences for both individual 
supernovae and the population. In §4.4, we construct 
Hubble diagrams by applying the statistical model and 
describe the application of bootstrap cross-validation to 
estimate prediction error. In §4.5, we discuss the poten- 
tial impact of dust in the NIR sample. We conclude in §5. 
In appendix ^TKl we briefly review the conditional inde- 
pendence properties of graphical models. Appendix SjB] 
presents mathematical details of the BayeSN method, 
and appendix SjC] describes its use in practice. 



2. HIERARCHICAL BAYESIAN FRAMEWORK FOR SN lA 
LIGHT CURVE INFERENCE 

Simple Bayesian analysis describes an inference prob- 
lem in which a generative model Ti. with a free param- 
eter 9 is assumed to underly the observed data V. The 
Bayesian paradigm is to derive inferences on 9 from the 
posterior density of the parameter conditioned on the 
data: P{9\V,H) oc P{V\9,n)P{9\n), where the first 
factor is the likelihood function and the second factor is 
the prior on the model parameter. 

Hierarchical, or multi-level, Bayesian analysis is a mod- 
ern paradigm of statistical modeling, which enables the 
expression of rich probabilistic models with co mplex 
structure on multiple logical levels (jCelman et al.l[2003.) . 
For example, if Vi represents the data on individual i, 
and 9i is a model parameter describing i, the values of 
9i themselves may be drawn from a prior or population 
distribution P(9i\a, f3), which in turn depends on un- 
known variables that describe the group level probabilis- 
tic model. These unknown variables a, (3 are termed hy- 
perparameters to distinguish them from the individual 
level parameters 9i. The hierarchical Bayesian joint pos- 
terior distribution over all parameters {9i\ and hyper- 
parameters a,/3 conditioned on the data for many (TV) 
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individuals V — {Vi} is then: 



P{{e,}-a,p\V) a 



N 



\{p{v,\e,)p{e,\a,p) 



.i=l 



P{a,P) 



(1) 

where the last factor represents the hyperprior density on 
the hyperparameters. The fully Bayesian approach is to 
analyze the full joint posterior density of all parameters 
and hyperparameters simultaneously conditioned on the 
entire data set. This ensures the complete and consistent 
accounting of uncertainty over all the inferred parame- 
ters. We can build in complex probabilistic structure 
by layering single- level models, at the individual level 
and also at possibly multiple population levels, and ex- 
pressing the conditional relationships that connect them. 
The hierarchical Bayesian paradigm is very well suited 
to combining information and uncertainties from many 
logical sources of randomness, interacting in non-trivial 
ways, in a principled, coherent and consistent statisti- 
cal framework for studying structured data. This is the 
strategy we adopt in this paper. 

In contrast to more classical methods of model fitting, 
the Bayesian approach is less concerned with the opti- 
mization problem, i.e. finding the "best" values or point 
estimates of model parameters fitted to given data, and 
more concerned with the construction of the full joint 
probability model, consistent integration over uncertain- 
ties, and the simulation and sampling of the joint pos- 
terior density. For most non-trivial models, the inte- 
grations of interest are analytically intractable, so we 
employ modern computation techniques to enable prob- 
abilistic inference. We introduce an MCMC algorithm 
(BayeSN) that uses stochastic simulation to calculate 
posterior inferences from the hierarchical framework. 

2.1. SN la Light Curve Models 

A central task in the statistical analysis of Type la SN 
light curve data is fitting empirical light curve models to 
time-series photometric data in multiple passbands. A 
data set for an individual supernova s, {2?^}, consists of 
observations in n photometric filters, F € {F^, . . . 
(e.g. {B,V,J,H}). Let = {U,m[,alJ^J^ be the 
set of Np observations in band F. We assume th ese have 
already been corrected for time dilation ( (Blondin et al.l 
|2008() and if-corrected to the supernova rest frame, so 
that ti is the rest-frame phase of the observation refer- 
enced to the time of maximum light in B-band (i.e. t — 



corresponds to 



is the apparent magnitude in 



filter F at this phase, and ap ^ is the measurement vari- 
ance. 

We wish to fit to this data a light curve model, Fq + 
l^{t, 9^), where Fq is the F-band apparent magnitude of 
the model at a reference time t = 0, and l^{t, 6^) is the 
normalized light curve model such that /^(0, 9^) = 0. A 
vector of light curve shape parameters, 9^, governs the 
functional form of the light curve models and may take 
on different values for each supernova. 

At the present time there is no known set of physical 
functions l^{t,9^) describing the temporal evolution of 
supernovae light curves. Theoretical SN la models alone 
provide insufficient guidance, so these functions must be 
constructed from the light curve data itself. Typical pa- 



rameterizations are motivated by a combination of sim- 
plicity, intuition, mathematical convenience and the em- 
pirical study of detailed light curve data of a large sample 
of supernovae. The ultimate utility of a functional form 
lies in its ability to fit the observed modes of variation in 
data and capture the observable information in the light 
curves. Examples of light curve shape functions are: 



1. "Stretch" template method (jPerlmutter et al.l 
[19991: IGoldhaber et al.l[200l : l^{t,e^) = fis^t), 
where the shape parameter is 9^ — , the 
"stretch" factor in filter F, and /(•) is a fidu- 
cial light curve, e.g. the Leibundgut template 
([Leibundgutl 1 1 9891 ) . which is a function of the rest- 
frame supernova phase with respect to to, the time 

of i?max. 

2. The Ami^(B) dec l ine rate paramete rization 
(|Hamuv et all Il996bl : IPhillips et all [19991 ). Here 
the light curve shape parameter is 9^ = Ami5{B), 
the magnitude decline between the peak and 15 
days after the peak in i?-band. The light curve 
function is defined at particular values of 9f us- 
ing BVI light curve templates generated from ob- 
served supernovae. Interpolation is used to fit 
light curves at int ermediate values of Ami5(_B). 
IPrieto et al.l (j2006[ ) presented an updated formu- 
lation. 



3. MLCS (|Riess et al.l[T996al ^99^ I Jha et al.l[200l : 

The light curve model in UBRVI is of the form: 
?^(i,A) = Fo{t) + APF{t) + A''QF{t), where Fo(t), 
Ppit), and Qpit) are defined by templates. The 
light curve shape parameter is 9^ = A. 

In this section, we do not assume any particular form 
for the light curve models l^{t;9^). The results will be 
applicable to a broad class of possible models. Without 
loss of generality, light curve model can depend on some 
parameters (©l ) linearly and others (S^l) nonlinearly. 
Hence, a general form of a light curve model in band F 
for the data is 



Fq + loiti; ^nl) + ^nl) • + 4 



(2) 



where li{t;9^j^) is a vector of coefficients to the linear 
parameters. It is convenient for computational purposes 
to separate the linear from nonlinear shape parameters. 
The parameter Fq could be considered a linear parame- 
ter and included with ©l- However, they are physically 
distinct quantities, as Fq sets the apparent magnitude 
scale for the light curve whereas generally models the 
shape of the light curve, so we keep them separate. 

2.2. Constructing the Clohal Posterior Density 
2.2.1. Light Curve Likelihood Function 
Assuming Gaussian noise for ef , we can write the likeli- 
hood of the light curve model parameters 9^ = (0^ , ©nl) 
for a single SN data set T>^ in one band F. Define the 
vectors and matrices: 



{m^{ti),...,m^{tN^)y 



(3) 



Lq (^nl) - Go (^i; ^nl)i ■ ■ ■ : ^0 (^Afj-; ^nl)) (4) 



4 



Mandel et al. 



(5) 



Let us construct a vector of ones, 1, of the same length 
as the data mf, and a measurement error covariance 
matrix . Due to the practical difficulties of estimat- 
ing the error covariances, the current standard assump- 
tion is that the error terms ef are independent, so that 
Wjf = cr^i is diagonal. The likelihood function for the 
light curve can be compactly written as: 



N{m^\ IFo + L^{9^^) + if ' W 



(6) 



where N{x\ fj,x,'^x) denotes the multivariate normal 
density in the random vector x with mean fi^ and covari- 
ance Sa;. Since the photometric observations in multiple 
filters are sampled with independent noise, the likelihood 
function of all light curve parameters over all bands given 
the multi-band data {2? } is the simple product of n 
single-filter likelihoods. 



Pi{v''}\4>) = l[Piv^\Fo,e' 



(7) 



We define the observable (or apparent) parameters vec- 
tor cj) = (i^o\ . . . , Fq"; . . . , 0^"). This vector, with 
the light curve model functions l^{t;6^), encodes all 
the information needed to reconstruct the apparent light 
curve of a single supernova, i.e. the apparent magni- 
tudes at the reference time and the light curve shape 
parameters in each of n photometric bands F. Similarly, 
we define the intrinsic (or absolute) parameters vector 

tp = {Mpi Mf" ; 6'^\..., 9^" ) encoding aU the in- 
formation describing the absolute light curves of the su- 
pernova. The absolute magnitude at peak in filter F is 
Mp = Fq ~ ^ ~ Ap , where /j is the distance modulus and 
Ap is the dust absorption in that filter, for a particular 
supernova. 

2.2.2. Redshift- Distance Likelihood Function 

Type la SN can be used as distance indicators because 
we possess some knowledge of their relative distances in 
the low redshift regime (where they are independent of 
the cosmological parameters ^Imt^a, and w) from the 
Hubble law and measured redshifts to the host galax- 
ies of the SN. However, inference of true luminosities 
and absolute distances requires external calibration (e.g. 
from Cepheids) or knowledge of the Hubble constant. Ho, 
which has not yet been independently measured to high 
precision. If we are concerned only with relative distance 
estimation, it is sufficient to fix the distance scale with an 
assumed h = Ho/100 km a~^. The uncertainty in these 
local distances is then dominated by the peculiar velocity 
field, which we model probabilistically. 

Let Zc be the cosmological redshift of a SN. The mea- 
sured redshift is z, with measurement variance a^, cor- 
rected to th e CMB and the local infall flow model of 
iMould et al.l ()200ClD . In a smooth cosmological model, 
the distance modulus is related to Zc- fj. = f{zc) — 
25 -I- 5logiQ[dhizc) Mpc~^], where di,{zc) is the luminos- 
ity distance in Mpc. If we model the effect of random 
peculiar velocity as a Gaussian noise with variance cr^^^, 

then z = Zc + N{Q,ap^^/c'^ + al), and the likehhood func- 



tion is P{z\ij) = N{z\f-\ii),al^Jc^ + al). The pos- 
terior density of the distance modulus conditioning only 
on the redshift is P(/i|z) oc P(z|/i). We use a flat prior 
P(/i) oc 1, since we have no a priori knowledge about 
/i without the data. For recession velocities cz ^ Upec, 
/(zc) can be linearized about the fixed z to find f~^{^), 
so that, to a good approximation, 

P{^^\z)=N[^^\fiz),al = [/'(z)]2(a^, Jc^ -|- a^)]. (8) 

In the low-z regime, where di^{z) is linear in z (the Hub- 
ble law), the variance is 



;lnlO 



pec 



(9) 



Under the assumption that peculiar velocity uncer- 
tainty amounts to Gaussian noise, at recession velocities 
cz < 5apec, the approximation of P{fi\z) with a nor- 
mal distribution breaks down, due to the non-linearity 
of the logarithm. This effect is inconsequential for our 
analysis because even though the distribution becomes 
non-Gaussian, at such low recession velocities, its width 
in magnitudes is much larger than the dispersion in SN 
la absolute magnitudes. The redshift of a very low-z 
SN la carries little information about the absolute mag- 
nitude, so that P{ii\z) is essentially fiat over the width 
of the posterior density in /i conditioning on the light 
curves of the SN. Hence, the exact form of P(/i|z) is ir- 
relevant in this regime. SN la light curves can be used 
to infer the distances of these near-fiel d supernovae and 
to measure the the local velocity field (iRiess et al.l 119951 : 
IHaugbolle eTaLl [20071 : iNeill et al.l[200l . 

2.2.3. The SN la Population Distribution 

The utility of Type la SN for cosmological studies 
lies in the observed correlation of their peak luminosi- 
ties with the shapes of their optical light curves. Al- 
though the peak optical luminosities of SN la range over 
a factor of 3, using correlations with light curve shape 
reduces the scatter about the Hubble line to less than 
~ 0.20 mag. Physical modeling and simulation of SN la 
progenitors may provide useful explanations for the ob- 
served relationships between the observable properties of 
SN la and their peak luminosities. Such work may also 
describe detailed probabilistic relationships between the 
two. For example, we can define the joint population dis- 
tribution of absolute magnitudes (at the reference time) 
and the observable light curve shapes in multiple pass- 
bands. In our notation this population distribution is 
P(i/? I Physical Parameters) where the "Physical Param- 
eters" may include, for example, the mass of the progen- 
itor, the chemical composition and distribution within 
the pr ogenitor, and the details of the explosion mecha- 
nism. iHillebrandt k NiemeveJ (|2000D provide a review 
of progress in SN la explosion modeling. 

In the absence of such detailed information, we 
learn the probabilistic relationships from the data. 
We describe the SN la population distribution as 
P(i/j| /x^, S^). Here, /j.^ is a vector of hyperparameters 
that describe the average intrinsic characteristics, and 
is a collection of hyperparameters describing the dis- 
persion (variance) and correlations of the intrinsic char- 
acteristics of absolute light curves. Since we have no 
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information on these hyperparameters a priori^ we seek 
to estimate them (and their uncertainties) from the data. 

We win include this distribution in the global posterior 
density in the mathematical form of a "prior" on the in- 
trinsic parameters ip oi a. single SN. However, it is better 
to think of this distribution as a "population" distribu- 
tion from which the intrinsic parameters are randomly 
drawn. Its hyperparameters are unknown and must be 
estimated simultaneously from the data. It has a dif- 
ferent interpretation in the context of the hierarchical 
model than the fixed prior of the simple Bayesian treat- 
ment (which has no hyperparameters to be estimated). 

Since we have no a priori information on the functional 
form of P(-| /Lt^, S^), we must make some assumptions. 
The simplest choice for a multivariate probability den- 
sity that models correlations between parameters is the 
multivariate Gaussian. In the rest of this paper we will 
assume P(-|/x^,S^) — 7V(-| /x,^, S^) with an unknown 
mean vector /x^, = £(■0^) and unknown covariance ma- 
trix — Yai[xl}s). The intrinsic parameters of individ- 
ual supernovae are independent, identically distributed 
random variables drawn from this probability density: 
\j) ~ N{iJ,^,Yi^). If the data indicate a different distri- 
bution from the one we have assumed, we can attempt 
another choice of the form of the intrinsic population 
distribution. 

The population hyperparameters /x^, are the most 
important variables in this hierarchical framework. Dur- 
ing the training process, they model the intrinsic sta- 
tistical properties of the SN la light curves, including 
the average behavior, intrinsic variability, correlations 
between different modes of light curve shape variation, 
correlations between absolute magnitudes in different fil- 
ters, and cross-correlations between light curve shape pa- 
rameters and the absolute magnitudes. When the model 
is used to make predictions, they are crucial for using 
this information, and its uncertainty, to make distance 
estimates for new SN la light curves. 

2.2.4. Incorporating Dust Information 

Dust along the line of sight from the supernova to 
the observer causes both extinction and reddening of the 
emitted light. These effects originate from Galactic dust, 
which has been measured and mapped (jSchlegel et al.l 
I1998D . and dust in the supernova's host galaxy, which 
is often more important, poorly understood and is cur- 
rently the largest systematic uncertainty in cosmologi- 
cal inference with SN la (jConlev et al.ll2007[ ). Previous 
efforts to esti mate dust extinctio n from SN la color ex - 
cess es include iRiess et al.l (ll996bD , iPhilhps et all (|1999[ ) , 
and iKrisciunas et al.l ( 2000[ r 

We incorporate the effects of host galaxy dust on su- 
pernova observations probabilistically within the full sta- 
tistical model as follows. The extinction in a given 
passban d is denoted Ap- Assuming a CCM redden- 
ing law (jCardelH et al.|[T989f) . the extinction in a given 
band can be related to the visual extinction Ay by 



Af/A^ 



ap + hpRy , so that there are two free pa- 



rameters: the magnitude of visual extinction Ay and 
the slope of the extinction law in the optical bands Ry- 
The fixed regression coefficents ap and hp are determined 
from the dust r eddening analysis of supernova spectra. 
iJha et ai] (|2007[ ) suggested using an exponential prior on 
the nonnegative Ay extinction to a particular supernova. 



Interpreted as a population distribution, this can be in- 
corporated into our framework as: 

P{Ay,Ry\TAv,OLR) = ExpOn(Ay |T^^, )P(i?y | OLr) 

where is a hyperparameter describing the exponen- 
tial scale length or the average amount of visual extinc- 
tion to the population of SN la. The form of the popula- 
tion distribution of the Ry is unknown, but we suppose 
it may be specified with hyperparameters ctfl. For exam- 
ple, if Ry is fixed to a single known value, e.g. an — 1.7 
or 3.1, we may set P{Ry\aR) = S{Ry — an). Or one may 
allow the Ry to vary within a population probability den- 
sity, e.g. Ry N{^pi,a\), where the hyperparameters 
OLpi = [fJLR, (t|j) may be fixed or we may attempt to learn 
them from the data, if the data is sufficiently informa- 
tive. It is not known a priori whether Ay and Ry can 
be treated as independent random variables and if the 
population probability density of {Ay,Ry) is separable 
as indicated in Eq. [TUl 

When modeling the near infrared observations, it 
makes more sense to reference the extinction values to 
the ff-band, rather than to Ay . For given Ah , Ry val- 
ues, the extinction in any band, Ap can be computed 
from the dust law. The ratio of near infrared to visual 
extinction is roughly Ah /Ay ~ 0.2. Since the behav- 
ior of dust in the near-infared is relatively insensitive to 
the slope of the reddening law in the optical bands, Ry, 
it makes sense to set it to a fixed representative global 
value an- With this choice, the extinction population 
distribution is: 

P[AH,Ry\TAHiCtR) = Expon(^H|rA„)(5(i?v - a^) 

where tah is the exponential scale length in magnitudes 
of the population distribution of _ff-band extinctions. To 
simplify the notation, we denote this hyperparameter as 
TA = TAm which controls the dust scale for all filters 
through the reddening law. 

2.2.5. Conditionally Conjugate Hyperpriors 

To estimate the population hyperparameters /x^ and 
S^, we make our priors on them explicit, called hy- 
perpriors. If we lack any external motivating evidence, 
we should choose non-informative or diffuse hyperpriors. 
Additionally, it is convenient to choose the hyperprior 
from a parametric family that is conditionally conjugate 
to the parametric family of the population density. This 
means that the posterior density of the hyperparameters 
conditioned on the values of the other parameters and 
data is from the same parametric family of probability 
densities as the hyperprior. This property is advanta- 
geous because if one can sample directly (generate ran- 
dom numbers from) the hyperprior density, then one can 
sample directly from the conditional posterior density 
of the hyperparameters. This is useful for constructing 
Markov chains for statistical computation of the poste- 
rior density using Gibbs sampling (section [2. 4p . 

The hyperparameters and describe a multivari- 
ate normal density on ^ps■ The conjugate family to the 
multivariate normal with unknown mean and covariance 
matrix is the Normal-Inverse- Wishart. This hyperprior 
can be expressed as P(/x^, S^) = P(/Xi/,| S^)P(S^) such 
that 



Inv-Wishartj,(, (Aq ) 



(12) 
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The non-informative or diffuse conditionally conjugate 
density is obtained in the lim it as kq — > 0, vq — > —1, 
|Aol — > (jGelman et al.l [20031 ) (the conventions regard- 
ing the Wishart distributions differ: we choose the con- 
vention that if W ~ Inv-Wishart,y(S'~'^) is a random ma- 
trix, then E(VF) = 5'/ (ly — d—l), where d is the dimension 
of the d X d covariance matrix). In this limit, the hyper- 
prior of the population mean /x^ becomes flat and the 
hyperprior of the covariance is a diffuse distribution 
over the space of positive semi-definite matrices, so that 
the hyperprior does not favor any particular solution. 

For the extinction exponential scale length Tyi > 0, 
we choose a uniform positive hyperprior, expressing no 
prior preference for a particular value. If this is viewed as 
an Inv-Gamma(— 1, 0) density on ta, it is conditionally 
conjugate to the exponential distribution. 

2.2.6. The Global Posterior Density 

We now have the elements necessary to construct the 
full joint posterior density of the sample of SN la. A sin- 
gle SN s with multi- band light curve data Vg = {T^f}, 
and redshift Zs, is described by intrinsic light curve pa- 
rameters ■j/'s, observable parameters (ps, and distance 
modulus fis, with dust extinction modeled by A'y and 
Ry. The relations between these parameters can be en- 
coded as follows. Let be a constant indicator vector 
with the jth component defined as: 

1, if (f>^,^p^ are magnitudes, e.g. Fq or Mp 
0, if (t>',^-' are shape parameters. 

(14) 

Furthermore, define the vectors Ag with jth component 
Al: if and ipl are magnitudes in band F, then 



(15) 



otherwise, A^ = if (l)^ and ipl are shape parameters. 
The non-zero components depend on the host galaxy red- 
dening law and i?-band extinction, A^^' is the Galactic 
extinction, and Ap^Aj^, Ry) is the dust extinction in fil- 
ter F as a function of Aj^ and Ry using the dust law. 
The relationship between the intrinsic and observable pa- 
rameters of supernova s can then be written compactly 
as: 

Cps^lps+Vfis+As. (16) 

This equation encodes the relationship between appar- 
ent magnitudes, absolute magnitudes, extinction and dis- 
tance moduli. In this expression, neither dust nor dis- 
tance modify the light curve shape parameters, 9f com- 
mon to both the observable (ps and intrinsic ips vectors. 
The joint posterior probability density for the parame- 
ters of a single supernova, conditioned on the values of 
the hyperparameters and the data, is proportional to the 
product of 

• the probability of observing the photometric data 
given the apparent light curve, 

• the probability of the distance modulus given the 
measured redshift, 

• the probability of an absolute light curve equal to 
the apparent light curve minus the distance modu- 
lus and extinction, and 



• the probability of the extinction value and dust law, 

conditioned on the population hyperparameters of the 
absolute light curves and dust properties: 



P{4)s,fJ:s,A%,Ry\ Vs, Zs;n,p, S^, TA, aR) 

CX P{Vs\<t>s) X P{^ls\Zs) 

X P{^|Js = 4>s -vfis - As\fJ.-4,,'Eji,) 
X P{A'lj,Ry\TA,aR). 



(17) 



Now consider the full database of SN la light curves V = 
{Vs} with measured cosmological redshifts Z = {zs}. 
The global joint posterior density of all supernova ob- 
servables {4>s}, distance moduli {/is}, dust parameters 
{Ajj,Ry} and the population hyperparameters condi- 
tioned on the database 7), Z is proportional to the prod- 
uct of A^sN individual conditional posterior densities mul- 
tiplied by the hyperpriors. 



P{{cl)s,fis,A%, Ry}; fi^, S.^, TA, anl T>, Z) 

Y[ P{ct}s,fJ-s,Ajf,Ry \ Vs,Zs;fi-4,,'E^,,TA,aR,) 



.8 = 1 



P(/i^,S^) X P{TA,aB.) 



(18) 



For full generality we have derived the global joint pos- 
terior density in the case that we wish to estimate the 
probable values of the hyperparameters an from the 
data. If we fix the Ry to a fixed global value an, this 
is equivalent to evaluating the above joint density condi- 
tioned on Ry = Ry = aR. All fully Bayesian inferences 
on the remaining parameters and hyperparameters are 
based on mapping out this global joint posterior density. 
The marginal posterior density of the hyperparameters, 
P{p^,'E^,TA\T>,Z,Ry) or P{fj,^,'E^,,TA,aR\T),Z) is 
obtained by integration over the individual SN param- 
eters {(f)s,fJ's,A%,Rlr} or {4>s,fis,A%}. 

We refer to the set of SN in I?, Z as a training set, and 
"training" means computing the marginal posterior den- 
sity of the hyperparameters, conditioned on this data. 
In the Bayesian paradigm we are interested not just in 
point estimates of the hyperparameters, e.g. "best val- 
ues" fi^,Tl^,TA, but on their joint posterior probability 
density as a quantification of their uncertainties. 



2.2.7. The Predictive Posterior Density 

The ultimate purpose of SN la light curve inference 
is to estimate luminosity distances to distant SN that 
are not included in the nearby, low-z training set. That 
is, given observations of a new supernova's multi-band 
light curve T^s, we wish to fit the light curve model and 
to predict the distance modulus. Prediction differs from 
training in the fact that we do not use any prior infor- 
mation on the distance (from e.g. the redshift) in our 
probability calculus. The predictive posterior density for 
the new supernova s (with parameters denoted by tilde) 
conditioned on the population hyperparameters and the 
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new light curve data 2?s is: 



conditional probabilities (the factorization): 



X P{A''H,Rv\TA,aR). 



N 



P{{S^}) = n^(^» I {Parents of 9,}) 



(20) 



(19) 



We must also incorporate our (joint) uncertainties of 
the hyperparameters. This is encapsulated in the 
marginal posterior density of the hyperparameters from 
the training set. The full predictive posterior proba- 
bility density for the new supernova s is the previous 
expression multiplied by the training posterior density 
P(/i.^, S^, r^, Qfi^ I P, Z) and integrated over the prob- 
ability of the hyperparameters /Li,/,, S^, r^, aij. The 
marginal predictive posterior density of the new su- 
pernova's distance modulus /is, P{ils\'Dg,'D, Z), is ob- 
tained by integrating this over the remaining parameters, 

4'S, A^jj, Ry. 



2.3. Representation as a Directed Acyclic Graph 

We have constructed the posterior density of all indi- 
vidual parameters and population-level hyperparameters 
conditioned on the observed datasct of multi-band SN la 
light curves. This was done by layering relationships of 
conditional probability. All hierarchical joint probabil- 
ity densities of data and parameters can be represented 
in terms of a probabilistic graphical model known as a 
directed acyclic graph. The graph consists of nodes rep- 
resenting parameters and data connected by arrows that 
represent probabilistic dependencies. It obeys the re- 
striction that there are no directed cycles, i.e. it is impos- 
sible to move from any node along the arrows and return 
to the same node. The acyclic requirement ensures that 
inference from the posterior density contains no loops of 
circular logic. It is useful to represent complex inference 
problems, involving many potential sources of random- 
ness, with an equivalent directed acyclic graphical model. 
Although all the information about the model and data 
is expressed by writing down the joint probability den- 
sity explicitly, probabilistic graphical models serve as a 
useful visual representation of the structure of the hi- 
erarchical model and their interface with data. Formal 
graphical models have not been used before in SN la in- 
ference, and they are not prevalent in astronomy, so we 
provide a basic introduction below. Further bac kground 
and theory of g raphi cal models can be found in iBishopI 
enseni (l200lh and [Pearl (1988). 

The directed graph is constructed as follows. Each 
parameter or datum corresponds to a node (or vertex). 
Conditional relationships between nodes are encoded us- 
ing directed links or arrows (edges). Hence the joint 
probability of two variables P{x, y) — P{x)P(jj\x) is rep- 
resented hy X y. For obvious reasons, the parameter 
X is termed the parent and y is termed the child. More 
generally, in a high-dimensional problem, if there exists a 
directed path of any length between node x and another 
node y, then y is a descendant of x. The joint probabil- 
ity distribution over TV random variables 9i represented 
by a directed graph can be written as the product of the 



If a parameter is observed (and thus conditioned on) , 
its node is shaded. If the parameter is unknown and 
hidden, it is left open. The graph clearly distinguishes 
between the observed data and the hidden variables that 
are inferred. Graphical models are most useful in in- 
ference problems involving many potentially interacting 
components or sources of randomness. The complexity 
of the problem is reflected in the connectedness of the 
graph. The links between the nodes encode statements 
of conditional independence ( iR)) . 

2.3.1. Directed Graph for Model Training 

The directed graph corresponding to the global poste- 
rior density conditioned on the training set T) oi N SN 
la is shown in Figure [TJ The pathways from the roots 
to the data can be understood as a generative model for 
a data set of SN la light curves. At the highest hierar- 
chical level (far left), the population distributions for SN 
la and dust extinction are described by unknown hyper- 
parameters /x^, S^, and TA- At the next level, each su- 
pernova s draws intrinsic light curves tps and extinction 
values As from these populations as independently and 
identically distributed random samples. These hidden 
parameters combine with the supernova distance modu- 
lus /is to produce the observable light curve ^s , which is 
sampled with measurement noise to produce the photo- 
metric light curve data 'Dg- In addition, the (hidden) dis- 
tance modulus is associated with a redshift z^, observed 
with both measurement noise and peculiar velocity un- 
certainty. For a sample of SN la light curves, this process 
is replicated TVgN times. Our goal is to compute the joint 
posterior density of all the open nodes conditioned on the 
data in the shaded nodes. 

The conditional independence properties of the graph 
and model imply that, although the individual parame- 
ters of one SN are conditionally independent from those 
of a different SN, given the population hyperparameters, 
they are not marginally independent. The hidden hyper- 
parameters are unknown a priori; they must be learned 
from the data jointly with the individual SN parameters. 
Thus, the full graph does not factor into independent 
iVgN subgraphs. We must condition the whole graph and 
the global joint density on a database of many SN la light 
curves simultaneously rather than on each supernova in- 
dividually. 

Figure [T] shows that the SN la population hyperparam- 
eters /x^ , are conditionally independent from every 
other parameter and datum in the graph, given the in- 
trinsic SN parameters {tps}- P{^J'^|),'^^|)\ ■ i 2, , {tp s}) = 
P{li^,Ti^\{tps})- Here we use (•) to indicate all the 
other parameters in the global joint density that have 
not been denoted explicitly. Similarly, given all the 
extinction values of the supernovae, {As}, the extinc- 
tion population exponential scale is conditionally inde- 
pendent from all other parameters (and data), so that 
P{ta\-,V,Z,{As}) = P{ta\{As}). 

The graph also shows that tps, /^s and As are condi- 
tionally dependent in the posterior distribution, because 
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Fig. 1. — Directed acyclic graph for hierarchical Bayesian infer- 
ence from a training set of Type la SN light curves. This is a 
graphical representation of the joint distribution of unknown pa- 
rameters and observations for a training set of N SN la. Each 
parameter is represented by a node, and the links between node in- 
dicate relationships of conditional probability. The variables in the 
far left column are the hyperparameters which describe the pop- 
ulation probability distribution of supernova characteristics, and 
the population distribution of extinction values. The variables in 
the middle left column describe the distances, extinctions, and ab- 
solute light curves of individual supernovae. The variables in the 
middle right column are the observable parameters that describe 
the apparent light curves of individual SN la. The final column con- 
tains the observations of the redshifts and multi-band light curves 
of individual SN la. The open nodes describe unknown and hidden 
parameters, whereas the shaded nodes describe observed values 
that are conditioned upon in the posterior density. 

their descendant Vg is observed, even though they are a 
priori independent random variables. This dependency 
is reflects the tradeoffs involved in explaining the ob- 
served light curves as a combination of random fluctua- 
tions due to dust, intrinsic randomness of the absolute 
light curves, and distance uncertainties attributed to pe- 
culiar velocities. The Bayesian approach is not to pick 
out just one possible combination of the separate fac- 
tors, but to consider the probability distribution over 
the whole ensemble of hypotheses. 

Another consequence of this conditional dependency is 
that there are unblocked paths between the SN la pop- 
ulation hyperparameters, /i^ and and the dust ex- 
tinction hyperparameter ta- These paths pass through 
the conditionally dependent parameters A^, xps, and <ps 
for each supernova. Thus, the population hyperparame- 
ters are also conditionally dependent. This implies that 
posterior inferences of /x^, and those of ta cannot be 
separated. This is why we take the global approach, con- 
ditioning the global posterior density on the entire data 
set simultaneously, and exploring the complete joint pa- 
rameter space. 

The conditional independence structure implied by the 
graph depends neither on the choices of distributions 
made in Scction[221 nor on the particular functional light 
curve model that is assumed. We depicted the directed 
graph for inference with fixed Ry- If we wish to learn 
about Rv, it would become a random variable with a 
population distribution. Hence the graph would include 
nodes for each Ry and a node for the hyperparameters 
aji, with the appropriate links. 




Fig. 2. — Directed acyclic graph for training and prediction with 
Type la SN light curves. The rectangle depicts a plate representing 
the AfgN SN la in the training set. The tilde parameters describe a 
new supernova for which we seek to predict the distance modulus. 
The open nodes describe unknown and hidden parameters, whereas 
the shaded nodes describe observed values that are conditioned 
upon in the predictive posterior density. 

2.3.2. Directed Graph for Prediction 

The directed graph for the prediction task using data 
from a new supernova is presented in Figure [5) We de- 
pict the entire training set of supernovae on a plate which 
is understood to represent A^sn different instances. The 
quantities relevant to the prediction supernova are la- 
beled with tildes. The essential difference between train- 
ing and prediction is that in the training set we use dis- 
tance information from the redshift, whereas in predic- 
tion we do not. The task of prediction is to infer the 
joint probability density of the hidden quantities fl, A, 
and by fitting the light curve data V described by 
the observable parameters (p plus measurement noise. 
The unblocked paths between the training set and the 
prediction set depict how information from the train- 
ing set constrains the population hyperparameters (i.e. 
by informing the posterior density), which in turn pass 
that information (and its uncertainty) onto the predic- 
tion variables. The marginal predictive posterior density 
for the new supernova's distance modulus is obtained by 
integrating over the uncertainties in the population hy- 
perparameters fi^, S^, and TA-, and over the extinction 

A, magnitudes and the shape parameters, (p. 

2.4. Statistical Computation of the Global Posterior 
Density 

The global posterior probability density of all param- 
eters and hyperparameters of the full model conditioned 
on the training set database of SN la observations, Eq. 
I18[ is a function of many variables. Consider a minimal 
model that docs not account for dust extinction. We 
suppose it has one shape parameter 6, and models light 
curves in three filters. There are four observable param- 
eters, plus one for the distance modulus, for each super- 
nova. In addition, the hyperparameters /x^ and con- 
tain four plus ten variables (since the covariance matrix 
of the absolute magnitudes and light curve shape param- 
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eters must be symmetric). Suppose a minimal training 
set contained observations of forty SN la light curves in 
the three filters. The total number of variables, which 
is the dimensionality of the space over which the global 
posterior density is defined, is 214. Clearly, mapping the 
joint posterior on a rectangular multidimensional grid 
is intractable. Even a relatively crude grid, with only 
five points per dimension, would require more than 10^^^ 
evaluations of the posterior density. 

To address the complexities of hierarchical inference 
with realistic data sets, it would be practically useful 
to construct a statistical inference approach without ap- 
pealing to the asymptotic results from large-sample the- 
ory. The only way to account for all uncertainties in 
the model parameters consistently is to compute the full 
hierarchical joint density, Eq. 1181 estimate all indi- 
vidual parameters and hyperparameters simultaneously, 
conditioned on the entire SN la database. Marginal es- 
timates of parameters are obtained by integration over 
non-Gaussian uncertainties. However, the obstacles to 
this approach are two- fold: (1) we must compute the 
global posterior density in a parameter space with hun- 
dreds of dimensions, and (2) the marginal estimates of 
single parameters require integration of the posterior 
density over hundreds of other parameters. 

We tackle both of these problems by using stochastic 
simulation techniques to sample the full parameter space 
efficiently. In this section we describe the construction 
and operation of a Markov Chain Monte Carlo (MCMC) 
algorithm that takes advantage of the conditional inde- 
pendence structure evident in the directed acyclic graph 
(Fig. [T]) of Section 12.31 to sample the global posterior 
density, Eq. [H 

2.4.1. Metropolis-Hastings and the Gibbs Sampler 

Markov Chain Monte Carlo is a general and well- 
established technique for statistical analysis and is well 
suited for Bayesian computations. It is employed, 
for example, in CMB and joint cosmology analyses 
([Lewis fc Bridl e 2002; Tcemark et al. 2004), for fitting 
light curve models (Mandel & Agol 2002) to planetary 
transit observations (,Holman et al., 2006) . and for radial 
veloc ity analysis of extrasolar planetary systems (|Fordl 
[20051) . Since MCMC has not been used previously in 
SN la light curve inference methods, we briefly review 
some basic elements of MCMC to establish terminology. 
More thorough treatments o f MCMC methods and the- 
ory can be found el sewhere (jCilks et aLlflOOSl : lLiiill200l 
IGelman et all[200l . 

The purpose of an MCMC algorithm is to generate a 
Markov chain stochastic process that is irreducible and 
ergodic, and converges in probability to a stationary dis- 
tribution that is the same as the target distribution (the 
posterior density). Upon convergence, the probability 
that the chain is in a particular state is equal to the pos- 
terior density of the state, and the proportion of time 
the chain spends in a given region of parameter space 
is proportional to the posterior probability of that re- 
gion. Hence, MCMC can be used to generate many 
samples from an arbitrary, complex probability distribu- 
tion (which cannot be sampled from directly), and those 
samples can be used to represent the target distribution 
and compute characteristics of the distribution, such as 
means, modes, intervals and integrals. 



The cornerstone of many MCMC implementations is 
the Metropolis-Hastings algorithm. Suppose our target 
posterior probability density is P{6\T>) for a vector of 
generic parameters 0, and can be computed up to a nor- 
malization constant. The MCMC algorithm generates 
a sequence of samples. Let 0* denote the tth sample. 
We start with some initial estimate 0*^^, and generate 
subsequent values of the chain as follows. We select a 
proposal (or jumping) probability density Q(6*\0), giv- 
ing the probability of proposing 6* for the next value 
given that the current state is 6. This proposal den- 
sity is chosen so that it can be directly sampled (e.g. 
a Gaussian). If the current state is 0*, then we gener- 
ate a proposal 0* from Q{6*\6^). We then compute the 
Metropolis-Hastings ratio: 

_ p{e*\v)/Q{e*\e') 
p{e\v)/Q{et\e*) ^ ' 

The proposal 6* is accepted (0*+^ = 9*) with probability 
min(r, 1). If it is not accepted, the proposal is rejected 
and the next value of the chain is the same as the current 
value 0*"'"^ = 0*. In the next iteration a new proposal is 
generated from (5(0*|0*+^) and the algorithm repeats. 

A special case of Metropolis-Hastings is the random- 
walk Metropolis algorithm in which the proposal distri- 
bution is symmetric Q{6*\9) = Q{6\9*) and the pro- 
posal is centered around the current position, e.g. 9* ~ 
N{9,a'^I). Gibbs sampling is another very useful case 
of the Metropolis-Hastings rule and proceeds by simply 
drawing from the conditional probability of each block of 
parameters in turn, conditioning on the others as fixed, 
until all the parameters have been sampled. We can em- 
ploy Gibbs sampling in our hierarchical SN la framework 
because the model is built up from conditional relations, 
and many of the conditional posterior distributions can 
be directly sampled. Our BayeSN algorithm uses a com- 
bination of these strategies to generate efficient MCMC 
chains. 

2.4.2. The BayeSN Algorithm - Training 

We describe the BayeSN MCMC algorithm in the con- 
text of computing the global posterior in Eq. [TH] for 
fixed Rv- Let S = {{4>s, fj-s, Aj^}, fj,^,'E^,TA) be a vec- 
tor containing all the current values of the parameters 
and hyperparameters in the model (the "state" or po- 
sition). BayeSN utilizes a sequential Gibbs sampling 
structure that updates blocks of parameters in S. Af- 
ter a full scan (after all parameters have been given a 
chance to update), the current state of S is recorded as 
an MCMC sample. To begin a chain we populate S with 
a set of initial positions. The BayeSN MCMC algorithm 
works in two stages: a) sampling the population hyperpa- 
rameters conditional on the individual parameters, and 
b) sampling the individual supernova parameters condi- 
tional on the population hyperparameters. An outline of 
the Gibbs scan follows; more details are presented in the 
appendices fjBl SJC) 

1. Update the SN la population hyperparameters, 
fi^, S^, conditional on the current values {ips}, obtained 
from the current parameters: xps = 4>s — vfis — Ag. This 
is done by Gibbs sampling directly from the conditional 
posterior density P{ti^ , | • , 2?, Z) = P{ti^ , | {i>s}) ■ 

2. Gibbs sample the extinction population hyperpa- 
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rameter ta from the conditional density P{ta\-,'D, Z) — 
P{rA\{Aj,}). 

Next we update the individual supernova parameters, 
conditional on the population hyperparameters we have 
just sampled. The individual parameters of one super- 
nova are conditionally independent from those of an- 
other supernova, given the hyperparameters. We cycle 
s through the list of SN la, and for each supernova s 
we repeat steps 3a to 3c to update observable parame- 
ters for each passband F. Let <f>J^°, cf)J^'^ , and 4>J^^'^ 
denote all the observable parameters in <ps other than 
the apparent magnitude, linear shape parameters, and 
nonlinear shape parameters in F, respectively. 

3a. Gibbs sample the apparent magnitude Fq^s 
by drawing directly from the conditional posterior 

3b. Gibbs sample the linear shape parameters 6^ ^ in 
filter F by drawing directly from the conditional density, 

Pi^lJ 4>J^'^: f^s,As;fJ,^, S^, TA, Vs, Zs). 

3c. Random-walk Metropolis update the nonlinear 
shape parameters in band F, O^j^ using a jumping 

kernel Sj^^p ^ to move through the conditional density 

3d. Update the distance modulus /is using Metropolis- 
Hastings. We propose a new fis drawn from a Gaus- 
sian approximation to the conditional posterior density 
P{fis\ct)s,As;fi^,'E^,TA,'Ds,Zs), and use Metropolis- 
Hastings rejection to correct for the approximation. 

3e. Update the extinction Ajj using a random- 
walk Metropolis step along the conditional density 
P{Ah\ Ms; tJ'i>,'^i>,TA), with a jumping scale cr^ump,^- 

4. Steps 3a to 3e are repeated for all SN la in the data 
set. After all parameters have been updated, we record 
the current state of S as an MCMC sample, and return 
to step 1. After we have iterated n times we finish with 
a Markov chain S = {Si, . . .St, ■ ■ - Sn)- 

2.4.3. BayeSN Algorithm - Prediction 

The prediction mode of BayeSN follows essentially 
the same algorithm. We assume that the prediction set 
is sampled from the same population as the training set. 
This could be false, for example, if the SN la in the pre- 
diction set had extremely different observed light curves. 
This would also be false if either observational selection 
effects or progenitor evolution caused a distant predic- 
tion set to sample a different portion of the SN la pop- 
ulation, or a physically different population, that is not 
represented in the nearby training set. Training and pre- 
diction actually can be conducted simultaneously in a 
single run of the Gibbs sampler. The main distinction 
is we do not condition on the redshifts of the SN in the 
prediction set, i.e. the factor P{^s\zs) would be replaced 
by P{^is) oc 1 in step 3d above. With this change, the 
BayeSN algorithm will generate inferences on the graph- 
ical model in Fig. [2l for both the training and prediction 
set SN simultaneously. 

In many cases, however, we may wish to train the 
model on the training set SN once, and store the pos- 
terior inferences of the population hyperparameters. To 
make predictions for new SN, we would recall this infor- 
mation and repeatedly apply it to the new data, without 
updating the training posterior inferences. We can do 



this by making two changes to the above algorithm. The 
goal is to generate a Markov chain S p that samples the 
predictive posterior density. We assume we have already 
done a training MCMC and have a chain S that samples 
the training posterior density conditioned on the training 
set V, Z. Steps 1, 2 and 3d change to: 

IP & 2P. Draw the population hyperparameters \x^, 
and TA from the marginal posterior training density 
P{ii^,Yl^,TA\'D,Z'). This is easily done by picking a 
random sample from the training chain S and using the 
values of the hyperparameters in that sample. 

3dP. Gibbs sample the predictive [is from 
P{[is\(^s, 11^,^2^, TA,V,Z), omitting the factor 
P{pus\zs) since we do not condition on the redshift for 
prediction SN s. 

With these steps the algorithm is run to build up a 
Markov chain «Sp of samples from the predictive poste- 
rior density. 

3. CONSTRUCTING TEMPLATE MODELS FOR NEAR 
INFRARED SN lA LIGHT CURVES 

To complete the statistical model we must specify func- 
tional models for the normalized light curves, l^{t;6^), 
such that Fq + l^{t;6^) describes the apparent light 
curve. The function l^{t;6^) captures variations in the 
light curve shapes in the observed filters. There is sig- 
nificant freedom in defining these functions and they will 
generally depend on the application. For the remainder 
of this paper, we apply the hierarchical model described 
above to the SN la data in the JHKs NIR bands. In this 
section, we describe our methods for generating empirical 
template light curve models in the JHKs bands. 

In the H and Ks bands, we model the light curves us- 
ing maximum likelihood templates that assume the nor- 
malized light curves of all the SN are identical, at least 
between -10 and 20 days from maximum light in B. In 
the J-band, where the PAIRITEL photometry is better, 
we construct a light curve model that captures the shape 
variations, specifically the timing and amplitudes of the 
initial decline, trough, second rise, second peak and sub- 
sequent decline. This Flexible Light-curve Infrared Tem- 
plate (FLIRT) model is constructed over the range -10 to 
60 days past maximum light in B and depends on four 
light curve shape parameters to desribe the features of 
the J-band light curve. 

3.1. Maximum Likelihood Light Curve Templates 

The simplest possible light curve model assumes that 
the normalized light curve in filter F of all SN are iden- 
tical. The normalized light curve data from all SN are 
sampled with noise from an underlying light curve func- 
tion of the rest frame phase l^{t). A strategy for describ- 
ing this function is to construct a template that is defined 
by a set of knots {p^,t} and an interpolation method. 
The knots are defined such that l^{t = Ti) = pf , and 
l^{t) — s{t;T) ■ , where the vector s{t;T) is defined 
by an interpolation method that is linear in the knot 
values . We choose a natural cubic spline rule which 
ensures smooth ness in the t emplate up to two continu- 
ous derivatives (jPress et al.l i2007). It is convenient to 
choose the to lie on a regular grid with spacing At 
such that one of the tj = coincides with the reference 
time (TBmax), forcing the corresponding pj^ = 0, so that 
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l^{t = Tj = 0) = Pj' — for the normalized light curve. 

The Nf photometric data points from supernova s are 
sampled with Gaussian noise from the model template 
plus a constant apparent magnitude offset Fq . The joint 
posterior of the apparent magnitude offsets and the tem- 
plate for the data sets Vf consisting of measurements 
from iVsN supernovae in one band F is proportional to 
the likelihood: 

iVsN 

Pi{F^},p^\ pf }) (X n NimfllF^ + Sp^, Wf) 

(22) 

where S is a matrix with ith row equal to s{tf; r). The 
joint maximum likelihood estimates of the apparent mag- 
nitudes {Fq} and the template is obtained by max- 
imizing the log joint likelihood function subject to the 
linear constraint /(O) — pj — 0. This can be accom- 
plished using the method of Lagrange multipliers. The 
quadratic optimization problem can be solved easily us- 
ing non-iterative linear methods and yields a unique so- 
lution. The maximum likelihood template light curve 
model produced this way is defined as I (t) ~ s{t; t) -p^ 
for band F. The H and Kg bands templates are depicted 
in Fig. [3] and the values p^ are listed in Table 1 . 

3.2. Flexible Light-curve InfraRed Template 
3.2.1. Definition 

Although the J-band light curves are very similar in 
shape near the peak, past 10 days after Bmax, there are 
variations in the time and depth of the trough and the 
time and height of the second peak. This behavior may 
be explained by the change s in the ioniz ation states of 
iron in the SN atmosphere ()KaseiJ[2006f l. We describe 
an empirical model for J-band light curves that trans- 
forms with simple scalings to mimic the individual de- 
cline timescales, trough depths, second rise timescales 
and second peak amplitudes of each supernova light 
curve. These scalings are suggested by an inspection of 
the light curves. 

We posit a J-band fiducial normalized light curve /(A) 
that can be transformed to accommodate the variations 
in characteristic time scales and amplitudes observed in 




15 20 



Fig. 3. — Maximum likelihood templates (grey curves) with the 
H and Ks band data. The H and Ks normalized light curves of 
different SN are very similar between -10 and 20 days. 



individual J-band light curves. Here A is a feature time 
scale describing the timings of features in the fiducial 
light curve. We can map this time scale to the chrono- 
logical time scale i of a particular supernova by introduc- 
ing a time-warping function that allows the pre-trough 
phase to be scaled independently from the post-trough 
phase. The rate at which the feature time maps to the 
chronological time is: 



dt 
dX 



a, if A < At, 
/3, if A > At 



(23) 



where the parameters a, f3 are positive constants and of 
order one and At is the feature time of the trough in the 
fiducial light curve. The solution is t{X) = a min(A, At) -|- 
(3{X — At)"*" where = max(M, 0). This function can be 
inverted as 



A(t) = 



a-^t, 



/3-H + At(l-a//3), 



if t < aXt 
if t > aXt- 



(24) 



These equations represent simple transformations be- 
tween the chronological and feature time axes, or the 
"horizontal" dimensions. 

Even after adjusting for variations in the two 
timescales, there are still variations in the depth of the 
trough and the amplitude of the second peak. This sug- 
gests that an individual normalized light curve is related 
to the standard light curve /(A) as 

/"'(t;a,/3, d, r) 

^(d[f{X{t))-M, X{t)<Xt 
\d[f{Xt) - fo] + r[/(A(t)) - /(At)], X{t) > At 

(25) 

where /o = /(O) = 0, and the parameters d and r are 
positive constants of order one. The decline parameter d 
controls the depth of trough by scaling the decline from 
maximum light, "vertically" in the magnitude dimension. 
A larger d will produce a deeper trough and a faster de- 
cline rate (in magnitudes per day). At the trough the 
magnitude is J{Ttr) = Jo + c^[/(At) — /o] and rise pa- 
rameter r controls the rise in flux towards the second 
maximum relative to the trough magnitude. A larger r 
will produce a higher second peak and a faster rise rate. 
This parameterization is constructed to preserve continu- 
ity in the light curve even as different phases are scaled 
in amplitude. This quantitative parameterization of two 
constants (a, (3) to control time scales and two constants 
(d, r) to control amplitudes in two different regimes of 
the light curve is sufficient to describe the variation in J- 
band light curves. This is a simple transformation from 
the fiducial light curve to the realized light curves of in- 
dividual supernovae. 

After the fiducial light curve /(A) is solved as a contin- 
uous function of feature time A, one can easily measure 
any key features, such as the feature time of trough mini- 
mum (Af), the feature time of second peak (Ap2), and the 
normalized magnitudes at these points, /(Af ) and f{Xp2). 
For any particular supernova's J-band light curve, we can 
measure these features using the solved parameters. The 
chronological time of the trough is Ttr — t{Xt) = a At, 
the trough-to-second-peak time is T2 --Ttr — P{Xp2 — At), 
the depth of the trough is J{Ttr) — Jo — d [/(At) — fo] 
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and the height of the second peak above the trough is 
J{T2) - J{Ttr) = r [/(Ap2) - /(At)]. The parameters of 
the model can be directly related to observable features 
of hght curves. 

3.2.2. Maximum Likelihood Construction of the FLIRT 

model 

The FLIRT model described above is completely spec- 
ified by the fiducial normalized light curve function /(A). 
We represent this function in the same way as the Max- 
imum Likelihood Light Curve Template ( ij3.ip . A set 
of knots (p, t) is defined on a regular grid, such that 
/(A) = s(A; r) • p, where the vector s(A; r) is fully spec- 
ified by natural cubic spline interpolation. Once p is 
known, an individual supernova light curve can be fit- 
ted with the FLIRT model by means of nonlinear max- 
imization of the likelihood to get point estimates of the 
light curve shape parameters 9'^ — (d'* , r'* , a* , /S'* ) and 
apparent magnitude F^. All that is now required is an 
estimate of the fiducial template p. The joint posterior 
over the supernova parameters and the fiducial template 
is proportional to the likelihood function 

p({jj,0/},p| {pf}) cx n ^("^.'1 Jo + sie'jp, wf ) 

(26) 

where the matrix S{6^) is derived from the defining 
equations Eq. [Ml Eq. [^S] and the interpolation method 
s(A;t). If {Jq,0/} are estimated and fixed, then the con- 
ditional maximization of logP(p | { Jq, {2?j^}) with 
respect to the template p subject to the constraint that 
/(A = 0) — pj — IS a. linear problem. 

It is straightforward to solve for the FLIRT model 
template iteratively. We select a subset of SN light 
curves that are well sampled. First we pool the pho- 
tometric data together and estimate the Maximum Like- 
lihood Light Curve Template po f iJ3.1|) as a first ap- 
proximation. Then we fit the FLIRT model using this 
template to each SN light curve by conditional maxi- 
mization of P( Jq , 0f , I Po, {2?'^}) to get estimates Jq , 
Next we fix the SN parameters and update the fidu- 
cial template by constrained conditional maximization of 
P{p I { Jg, 6g}, {D^}) to get a new template pi. We iter- 
ate until the maximum likelihood template p converges. 
The template is rescaled so that the sample median val- 
ues of the fitted SN shape parameters (d, r, a, p) are equal 
to one (so that the fiducial template model, which has 
shape parameters equal to one, refiects a typical light 
curve). 

In practice, it is convenient to measure the relative 
decline rates d/a and the relative rise rates r//3 rather 
than the depths and heights directly. The light curve 
shape parameters are then O"^ = {d/a,r/P,a, /3). The 
maximum likelihood J-band FLIRT fiducial template 
p,j is listed in Table 1, and depicted in Fig. 21 which 
also shows the effects of varying each of the parame- 
ters. The trough of the fiducial template is located at 
(At, /(At)) = (14.43 days, 1.64 mag) and the second peak 
is (Ap2,/(Ap2)) = (29.55 days, 0.90 mag). 

In Fig. [51 we display the J-band FLIRT fiducial hght 
curve, along with the J-band photometry for the 39 SN 
listed in Table 2, shown in grey. We have also trans- 
formed each SN light curve data set using the fitted light 
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Fig. 4. — FLIRT model for J-band light curve shape varia- 
tions. In each panel, the fiducial FLIR, template with param- 
eters (d/a, r / (3, a, (3) = (1,1,1,1) is shown as the middle curve 
along with models with one parameter varied while keeping the 
others fixed. For example, the first panel depicts (0.7, 1, 1, 1) and 
(1.3, 1, 1, 1). The parameters correspond to the initial decline rate, 
the second rise rate, the time from peak to trough and the time 
from trough to second peak. 



TABLE 1 
JHKs FLIRT AND Max. Likelihood 
Templates 



T-To J -Jo H-Ho Ks-Kso 



-10 


0.74 


0.34 


0.48 


-5 


-0.11 


-0.18 


0.03 





0.00 


0.00 


0.00 


5 


0.44 


0.09 


0.22 


10 


1.36 


0.21 


0.40 


15 


1.63 


0.08 


0.30 


20 


1.44 


-0.13 


0.18 


25 


1.15 






30 


0.91 






35 


1.33 






40 


1.82 






45 


2.23 






50 


2.60 






55 


2.91 






60 


3.29 







Note. — All templates arc interpolated 
using natural cubic splines. 

curve parameters in Table 2 to the same scales as the 
fiducial light curve by inverting Eq. 1251 The dramatic 
reduction of dispersion from 5 to 60 days shows that the 
FLIRT model successfully captures the shape variations 
in the J-band SN la light curves. The double-peaked 
light curve structure is also seen in the H, K and / bands. 
In the future, it may be worth exploring FLIRT models 
in those bands. 

4. APPLICATION AND RESULTS 

4.1. Nearby SN la NIR Light Curves 

A comprehensive data set of nearby SN la light 
curves in the near infrared was compiled by WV08, 
including observations of 21 recent SN with the 
Peters Automated InfraRed Imaging TELescope 
(PAIRITEL) taken by the CfA Supernova Group 
and observations of 23 SN from the literature 
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J (39 SNe la) 




20 30 
Time Since T„ 

Bmax 

Fig. 5. — The J-band light curve data (blue squares) exhibits 
significant shape variation after the initial decline. After fitting 
for these variations using the FLIRT model, the data was trans- 
formed to the fiducial frame and overplotted (black dots) with the 
fiducial FLIR template light curve (black line). This demonstrates 
that the FLIRT model successfully captures these light curve shape 
variations. 



(iJha et al.lll999t [Hernandez et al. I2000t iKrisciunas et al.l 



200C/: 'P i Paola et all "l200p^ 



Krisciunas et al^ 
Elias-Rosa et al. 



20011 



2001 



Vale ntini et al.''" 
2004a.b, 



"'2003 
2007 



2006, 2008' 'Pastore llo et all 12007 : 
Stanishcv e t al. ^07; Pi gnata et al.. 200^ . Of these, 
three (SN 2005bl, SN 2005hk, and SN 2005ke) are 
omitted because they are fast-declining, peculiar SN 
with "dromedary" 7J-band light curves that have only 
one peak, whereas most i7-band light curves are "bac- 
trian," having two peaks. We use the remaining data set 
with two exceptions. The very late J-band secondary 
maximum of SN 2002cv, and its extre me reddening and 
estimated optical extinction {Ay > 8) (|Elias-Rosa et al.l 
l2008f ) suggest this light curve is unusual, so we have 
omitted it from the analysis. We have also omitted 
the PAIRITEL observations of SN 2005eu, because we 
judged the JHKs image subtractions to be of poor 
quality. The final, edited set of observations covers 39 
SN la. We have only used photometric measurements 
with signal-to-noise ratio > 3. Extensive studies of two 
S N in this set, SN 2005cf and SN 2006X, were presented 
bv lWang et all (|2008, 2009). 

To construct the H and iCg-band templates (Fig. [31 
Table 1) we used all the light curve observations from the 
data set. For the J-band light curves we selected a subset 
of 16 well-sampled light curves to generate the fiducial 
FLIRT template (Fig. [1 Table 1). This subset consisted 
of SN1998bu, SN 1999ee, SN 2001bt, SN 2001cn, SN 
2001CZ, SN 2002bo, SN 2005el, SN 2005eq, SN 2005na, 
SN 2006D, SN 2006N, SN 2006X, SN 2006ac, SN 2006ax, 
SN 20061c, and SN 200611. 

All photometric data were K-corrected to the SN 
rest frame by linearly interpolating the tables of 
IKrisciunas et al.l (|2004bl ). and registered to a common 
phase by subtracting from the Julian Day the time 
of _B-band maximum, Tsmax, as determined by the 
MLCS2k2 fits to the optic al light curves obser ved by 
the CfA Supernova Group (jHicken et al.ll2009aD . The 
phases were corrected for time dilation using the he- 



liocentric redshifts. Recession velocities were corrected 
to the CMB-I- Virgo infall rest frame, as described in 
WV08. Furthermore, a peculiar velocity uncertainty 
150 km s"^ (|Radburn-Smith et al.|[200l was as- 
sumed. Luminosity distances were computed from the 
redshifts assuming an LCDM model with flM = 0.27, 
Oa = 0-73 and a Hubble constant scale of /i = 0.72 
(jFreedman et al.ll2001l: [Spergel et al.ll2007D . At the most 
distant end of the sample at z « 0.04, the relative differ- 
ence between the luminosity distances in LCDM and in 
an Einstein-de Sitter universe is 2%. 

4.2. JHKs Light Curve Model Specification 

The light-curve models we construct for the JHKs 
data set consist of maximum likelihood templates for H 
and Ks between -10 and 20 days ( W3.ll Fig. [3]) and the 
J-band FLIRT model between -10 and 60 days [MM Fig. 
[3]) with At = 5 days. The H and Kg models have no light 
curve shape parameters, and the J-band FLIRT model 
has four: 9^ = 0, 6»^= = 0, and 6^ ^ {d/a,r/P,a,P). 
The niultiband normalized light curve models as defined 
in Eq. [21 arc then fully specified by 

H{t) ^Ho = l^{t) = fnit) = S{t; r) ■ p" , (27) 

Ks (t) - Kso = {t) = fK, (t) = S(t ; r) • p''^ , (28) 

J{t)-Jo = l({t;ei^)-ei (29) 

where the J-band linear parameters are — {d/a, r/f3), 
the nonlinear parameters are 6^j^ ~ [a, /3), and the vec- 
tor function Zf {t; ^nl) determined by Eq. |24| and Eq. 

m 

In the notation of the hierarchical framework de- 
scribed in §2, the observable or apparent parameters 
are (ps = [Ja, Ho, Kso, d/a,r / /S, a, (3) for each for su- 
pernova s, and the intrinsic or absolute parameters are 
ips = {Mj,MH,MK,,d/a,r/(3,a,(3) for each supernova 
s. The population hyperparameters are fi^ = ^[ips] and 
S0 — Gov[ips,ipJ] with expectations with respect to the 
SN la NIR light curve population randomness. 

Since dust extinction and reddening have small effect 
on the NIR light curves in our sample, we omit the full 
modeling of the multiband extinctions As and dust popu- 
lation characteristic ta- The most optically reddened SN 
in the sample (SN 1999cl, 2006X, and SN 2003cg) are also 
at low redshifts, where the adopted velocity model gives 
them little weight in the determinations of population 
means and covariances of the NIR absolute magnitudes. 
Hence we set all As to zero and use the one-population 
model for SN la NIR light curve randomness only. In 
i j4.5l we estimate the potential effect of dust on our pos- 
terior inferences. In the near future, we will use the full 
two-population model with NIR and optical data for a 
simultaneous hierarchical modeling of SN la light curve 
shapes and dust extinction. 

After plugging the specified JHKs light curve models 
and parameter dependence into the hierarchical frame- 
work of §2, we perform probabilistic inference using the 
BayeSN algorithm of §2.4 to compute the joint posterior 
density over all individual parameters for the 39 SN and 
population hyperparameters. There is a total of 347 pa- 
rameters and hyperparameters in the statistical model. 
Initial positions for the Markov chains were obtained by 
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adding random noise to the MLE estimates of the SN 
parameters obtained in §3. It is not necessary to specify 
initial guesses for the hyperparameters. We set the scale 
of the inverse Wishart hyperprior, Aq — eol by choosing 
a small value eg — 10^"*. We found our inferences were 
insensitive to ten-fold changes in eo- 

The BayeSN MCMC algorithm was run for 5 inde- 
pendent chains with 2 x 10'* samples each. The Gelman- 
Rubin statistic was computed for all parameters: the 
maximum value was 1.03 and 99% had values less than 
1.02, with the mean and median values less than 1.005. 
Acceptable values of the G-R statistic are typically less 
than 1.10 (jCelman et al.|[2003.V The first 2000 samples 
of each of the chains were then discarded as burn-in and 
the chains were concatenated for posterior analysis. We 
found that our inferences were insensitive to the burn-in 
cutoff if it was greater than ^ 1000 samples. 

4.3. Posterior Inferences 

The MCMC results produce samples from the global 
posterior density over all parameters and hyperparam- 
eters, Eq. [THl We summarize the posterior density 
by examining marginal posterior densities over sub- 
sets of parameters. Inferences at the level of individ- 
ual supernovae can be summarized by the probability 
density P{(psT ^J■s\^^, 2) for each supernova s. Infer- 
ences at the SN la NIR population level are summa- 
rized by P{fi^,'E^\V,Z). This can be further bro- 
ken down into the marginal densities over mean proper- 
ties of absolute light curves T), Z), the probability 
over covariances between multiband absolute magnitudes 
P(S[(Mj, Mh, MkJ, {Mj, Mh, MkM'D, Z), marginal 
densities the covariances in light curve shape: P(S(0, 0)) 
and marginal posterior densities over covariances be- 
tween light curve shape and absolute magnitudes: 
P{Y.[{Mj , Mr, MK^,0]\V, Z). These posterior densi- 
ties are integrals over the global posterior density and 
can all be computed easily and directly from the MCMC 
chain. We show example light curve data and model fits 
in Figs. H El andlS 

4.3.1. SN la JHKs Light Curves 

The univariate marginal posterior median and stan- 
dard deviations of the individual SN light curve param- 
eters cj)s = {Jq, Hq, KsQ,d/a,r/ f],a, P) for each of the 
39 SN are listed in Table 2. The light curve fits are ex- 
cellent, especially in the J-band, where the PAIRITEL 
photometry is the best. The MCMC chains quickly find 
the region of parameter space near the peak of the poste- 
rior probability distribution, especially if the data tightly 
constrain the light curve fits. 

The SN light curve data in the training set is not ho- 
mogeneously well sampled. Some SN, e.g. SN 2006X, 
have extremely good sampling in JHKs from before 
maximum to well past the secondary maximum. Such 
well sampled, complete data sets constrain the observ- 
able light curve parameters very well. Other SN are 
sparsely sampled or have incomplete coverage over the 
range of the model, for example SN 1999gp, SN 2007cq, 
SN 2005ao. Some SN, for example SN 1998bu, are weU 
sampled in the early part of the J-band light curve but 
the measurements stop before the secondary maximum. 
Several SN in our sample (SN 1999ee, SN 2000bk, SN 



2000ca, SN 2005iq, and SN 2007cq) have no i^,-band 
data. 

In these cases the advantages of the Bayesian approach 
are clear. Since we have defined a joint probability den- 
sity over all data and parameters (both of which are con- 
sidered random variables) in Eq. [THl we have a proba- 
bility distribution over the parameters that are not well 
constrained by the individual SN data because of missing 
observations. The Bayesian computation yields sensible 
numerical estimates of the poorly constrained parame- 
ters and their uncertainties using the joint probability 
over the observed parameters, the population distribu- 
tion of individual SN parameters, and the uncertainty in 
the hyperparameters of the population distribution, all 
conditioned on the actual observed data and its uncer- 
tainty. For the very well-sampled, complete light curves, 
the posterior density over SN light curve parameters will 
be dominated by the information from its own light curve 
data. An example of this is SN 2006X, shown in Fig. [H] 
along with the light curve fits. For sparse light curves, for 
example, SN 2006cp (Fig. [7]), some of the parameters will 
be informed by the population distribution constrained 
by the whole training set of SN. In an intermediate case 
(e.g. SN 2005cf, Fig. [SJ an incomplete hght curve), a 
balance between the existing data, observed parameters 
and the population distribution of the poorly constrained 
parameters is achieved, and an appropriate uncertainty 
is computed. These computations are already handled 
automatically by our sampling of the global posterior 
probability density. 

4.3.2. NIR Absolute Magnitudes 

A summary of posterior inferences of the SN la NIR 
light curve population hyperparameters is presented in 
Table 3. The univariate expectation of the means /.t0 is 
shown along with the standard deviations of the univari- 
ate marginal densities. We list the modal values of the 
square root of the variances cr^ , which are the diagonal 
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Fig. 6.— JHKs light curve data and model fits to SN 2006X. 
This is a very well sampled light curve, and the fit to the light 
curve model (black curves) is excellent. The H and Ks bands are 
fit to the maximum likelihood templates, and the J- band is fit to 
the FLIRT model. The data of this SN tightly constrain the light 
curve parameters. 
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TABLE 2 

Posterior Summary of SN Ia JHKs Light Curve Parameters 
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Time of maximum liglit in B: Julian Date - 2,400,000. Jo, Ho, K^o arc ,/, H 

^ Reference codes: WV 08 IWood-Vasev et al.l (PAIRI TEL pl iotometry : 2008): 

IKrisciunas et al.l II2000I): KOI: IKrisciunas et al.l ifSooIl); V03: IValentini ct al. I200'3): K03: iKrisciunas ct al 



and Jf 3 at T b 



J99: 



j2004bD: K04c: 
IStanishev et al 



et al. (1999); HOO: Hernandez ct al. (20 0j); KOO: 

^ ( 2003); K04b: Krisciunas et aLl 

Krisciunas ct aT (2004c'); K07: ' Krisciunas et al.l 1 120071) : ER06: lElias-Rosa et al.l (I2006f) : Pa07: IPastorello et al.l (120071) : St07: 



((2003); P08: Pignata ct al.. (,2008h . 



TABLE 3 

Summary of Posterior Inference: Population Hyperparameters 





Mj 


Mh 


Mks 


d/a 


r//3 


a 


/3 




-18.25 (0.03) 
0.17 (0.03) 


-18.01 (0.03) 
0.11 (0.03) 


-18.25 (0.04) 
0.19 (0.04) 


0.95 (0.03) 
0.15 (0.03) 


0.97 (0.03) 
0.13 (0.03) 


1.04 (0.03) 
0.15 (0.02) 


1.01 (0.04) 
0.22 (0.04) 



p{Mh, 

p{d/a, 
P{r/I3, 
P{a, •) 

pW, ■) 



1.00 
0.73 (0.03) 
0.41 (0.09) 
-0.14 (0.29) 
0.52 (0.03) 
-0.07 (0.46) 
-0.28 (0.18) 



0.73 (0.03) 
1.00 

0.53 (0.04) 
-0.03 (0.47) 

0.59 (0.03) 

0.24 (0.20) 
-0.21 (0.35) 



0.41 

0.53 (0.04) 
1.00 
-0.16 (0.34) 

0.76 (0.01) 

0.07 (0.39) 
-0.48 (0.11) 



-0.03 (0.47) 
-0.16 (0.34) 
1.00 

0.55 (0.02) 
-0.77 (0.00) 

0.07 (0.41) 



0.59 (0.03) 
0.76 (0.01) 
0.55 (0.02) 
1.00 
-0.50 (0.02) 
-0.43 (0.07) 



0.24 (0.20) 
0.07 (0.39) 
-0.77 (0.00) 
-0.50 (0.02) 
1.00 
-0.04 (0.47) 



-0.28 
-0.21 (0.35) 
-0.48 (0.11) 
0.07 (0.41) 
-0.43 (0.07) 
-0.04 (0.47) 
1.00 



Note. — (top) Population means and variances of the absolute parameters. Values in the parentheses are the standard 
deviations of the marginal posterior density in each parameter. The estimates of the cr(.) are modal values, (bottom) 
Population correlation matrix for the absolute parameters. Estimates of the correlations ■) are the modal values. The 
parentheses contain the tail probabilities as described in Eq. 1301 
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Fig. 7. — JHKs light curve data and model fits (black curves) 
to SN 2006cp. The light curve data is sparse and incomplete. The 
BayeSN method estimates the J-band light curve where the data 
is missing using the information in the population distribution of 
the set of SN and its uncertainty. For example, the correlation 
of the initial decline rate with the second rise rate provides some 
information. Since the population of J-band light curves exhibits 
significant late-time shape variations, the late- time model fit is 
very uncertain, as reflected by the grey error tube spanning the 
16% and 84% quantiles of the posterior uncertainty in the light 
curve. 




Bmax' 



Fig. 8. — JHKs light curve data and model fits (black curves) to 
SN 2005cf. The light curve data is adequately sampled in the early 
part of the light curve up to second rise, but ends before reaching 
the second peak and decline. The BayeSN method estimates the 
second maximum and late-time decline using a combination of the 
constraints imposed by the data and the population distribution 
of the training set. For example, the final data point provides a 
lower bound for the time of the second maximum. This makes the 
posterior distribution of the /3 parameter non-gaussian. 

values of the covariance matrix S^,, and the standard 
deviations of their univariate marginal posterior prob- 
ability densities. We also list the modal values of the 
correlations p{-,-) obtained from the off-diagonal terms 
of the covariance after factoring out the variances. 
The marginal modes are estimated from the histogram 
of MCMC samples in each quantity. In addition we list 



Ptail 



^ rP(p<0), 

\f(p>o), 



if mode(p) > 0, 
if mode(p) < 0. 



(30) 



The smaller the tail probability, the greater the evidence 
that the correlation is different from zero, either posi- 
tively or negatively. The probability densities of correla- 
tion coefficients have support between -1 and 1 and are 
typically asymmetric. The probability densities of vari- 
ance parameters are also non-gaussian, since they are 
forced to be positive, and have fat tails towards higher 
variance. This captures the intuition that for a finite 
sample with fixed scatter, it is more difficult to discount 
the hypothesis that it arose from a high- variance distri- 
bution rather than a low variance one. 

The population mean absolute magnitudes are 
fj,{Mj) ^ -18.25 ± 0.03, fj-iMn) = -18.01 ± 0.03, 
and ^{Mks) ~ —18.25 ± 0.04 mag (on the scale of 
h — 0.72), and the population standard deviations are 
a{Mj) = 0.17 ± 0.03, a{MH) = 0.11 ± 0.03, and 
(t{Mks) = 0.19 ± 0.04 mag. In Figure [H we show the 
bivariate joint posterior density of the mean and vari- 
ance for the absolute magnitude in each band, and the 
bivariate modal values. The skews in the posterior densi- 
ties for the variances are visible. The absolute magnitude 
in the if-band clearly has much less intrinsic dispersion 
than in the J- and K-h&nd and is the best constrained. 
We have used bivaraite kernel density estimation with 
the MCMC samples to compute the 68% and 95% high- 
est posterior density contours and the mode, shown in 
the figure. 

Figure [10] shows the marginal posterior estimates of 
the individual SN H and J absolute magnitudes (ob- 
tained from, e.g. P(Jq — /is| 2?, Z)) plotted with contours 
representing the 68% and 95% probability contours of 
the bivariate population density P{Mj, Mh \ fJ-^p, S^) es- 
timated using the modal values of the covariance matrix 
(Table 3). We also show the marginal posterior den- 
sity of the correlation coefficient for the pair of absolute 
magnitudes. 

We see that the absolute magnitudes in J and H are 
highly correlated {p w 0.73) with strong evidence for 
positive correlation {P{p > 0) > 0.97). The data also 
suggest that intrinsically brighter SN are typically bluer 
in J — H color. Interestingly, this parallels the "brighter- 
bluer" relation seen in optical light curves (e.g. lGuv et al.l 
[2005; Jha ct al. 2006). There is also evidence for positive 
correlations between Mj, Mks and Mh, Mks, although 
the modal correlations are weaker {p « 0.4 — 0.5, Table 
3). 

4.3.3. Statistical Structure of J-band Light Curve Shapes 

We examine the statistical relationships between the 
different features of the J-band light curve. We focus 
only on those correlations which are significantly differ- 
ent from zero, as measured by the tail probabilities of 
the posterior distribution (Table 3). 

The peak-to-trough initial decline rate, as measured 
by d/a, is moderately correlated {p « 0.55) with the 
trough-to- second peak rise rate, as measured by r/f3. 
The posterior probability of a positive correlation is 98%. 
This trend indicates faster pre-trough declines lead to 
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Fig. 9. — Joint posterior probability densities in the population 
mean and population variance of the peak absolute magni- 
tudes in each NIR band. The crosses and the numbers in each panel 
denote the mode of the bivariate probability density. The con- 
tours contain the 68% and 95% highest posterior density regions. 
These estimates were obtained directly from the BayeSN MCMC 
chain of the trained statistical model. The univariate marginal 
estimates of the population variances are: a(Mj) = 0.17 ± 0.03, 
o-(Mh) = 0.11 ± 0.03, and (t{Mks) = 0.19 ± 0.04. 




Fig. 10. — Strong correlation p between J- and H- band peak lu- 
minosities. The grey ellipses contain 95% and 68% of the bivariate 
population probability distribution using the modal values of the 
population covariance. The straight lines indicate sets of constant 
J — H color. There appears to be a trend that bluer J — H objects 
are also intrinsically brighter. The marginal posterior probability 
density of the correlation coefficient obtained via MCMC is shown 
in the inset. The mode is p = 0.73, and P{p > 0) = 0.97 is obtained 
by numerical integration of the marginal density. 

faster post-trough rises. There is a strong correlation 
(p « —0.77) in the early light curve between the ini- 
tial decline rate and the time to trough (a), demonstrat- 
ing that slower declining light curves have later troughs. 
There is a moderate {p w —0.43) correlation between the 
post-trough rise rate and the trough-to-second peak time 
(P), suggesting that in the post-trough phase, slower ris- 
ers have later secondary maxima. The posterior proba- 
bility of a negative correlation is 93%. As shown in Table 
3, there is no correlation between the early time scale (a) 



and the late, post-trough time scale (/?). 

4.3.4. Statistical Correlations between NIR Absolute 
Magnitudes and Light Curve Shape 

Statistical correlations between peak SN la absolute 
magnitudes and light curve shape are of paramount im- 
portance to cosmological studies, because they relate the 
intrinsic luminosity, a hidden, intrinsic parameter, to a 
distance-independent observable measure. Relations be- 
tween optical light curve shape and optical luminosity 
have been leveraged to improve the utility of SN la as 
standard candles and distance indicators. We present 
the first quantitative search and measurement of corre- 
lations between near infrared absolute magnitudes and 
light curve shape as measured from the J-band light 
curves. Again, we only highlight correlations having the 
highest posterior probability of being non-zero as mea- 
sured from the tail probability. 

Figure [TT] show the Mj and r//3 estimates for individ- 
ual SN together with 68% and 95% probability contours 
of the population density P{Mj,r/(3\ /x^, S^) using the 
expected posterior estimate of the population mean and 
the modal covariance matrix. The most likely correlation 
is moderate (p w 0.52). The evidence for a positive cor- 
relation is fairly strong: P{p > 0) = 97%. This demon- 
strates that brighter SN la J-band light curves are likely 
to rise more slowly to the second maximum. 

Figure [121 shows a moderate correlation {p « 0.59) of 
J-band rise rate with the iJ-band luminosity. There is 
good evidence for a positive correlation {P{p > 0) = 
0.97). Figure [13] shows that the JC^-band luminosity 
has a fairly strong correlation with the J-band rise rate 
(p w 0.76) with strong evidence {P{p > 0) = 99%) for 
a positive relation. In these bivariate plots we have only 
shown individual SN with posterior uncertainty smaller 
than the population width in each parameter. The fully 
Bayesian calculation properly accounts for the uncertain- 
ties in the parameters of individual SN when determining 
the posterior density of the population correlation. 

Taken together these correlations suggest that SN la 
light curves brighter in the NIR at peak have slower rates 
of evolution at later times, as measured from the J-band 
light curve. A larger sample of SN la light curves in 
the NIR is needed to confirm these correlations. The 
best measured light curves tend to be at lower redshifts, 
where peculiar velocity uncertainties make the absolute 
magnitudes highly uncertain. Supernovae farther out in 
the Hubble flow have better determined absolute magni- 
tudes but are likely to have poorer quality measurements 
of the whole light curve evolution. Continued monitoring 
of local SN in the NIR over a wide range of distances and 
redshifts will help to solidify our estimates of these corre- 
lations (by narrowing their posterior probability densities 
and providing better estimates of the modal correlation 
coefficients). 

Using the trained statistical model, we can estimate 
the expected precision of distance prediction for different 
types of light curve observations represented by subsets 
of the observable light curve parameters in <ps- Suppose 
the observable vector (ps of a new hypothetical SN can 
be partitioned into the observed parameters and the un- 
observed parameters (ps = (0°,^"). We have computed 
the variance of the predictive jls conditional on <p°, and 
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Fig. 11. — Evidence for moderate correlation between the J-band 
second rise rate and the J-band peak luminosity. The grey ellipses 
contain 95% and 68% of the bivariate population probability dis- 
tribution using the modal values of the population covariance. The 
inset shows the marginal posterior probability density of the cor- 
relation coefficient obtained via MCMC, along with the mode and 
probability of positive correlation with the absolute magnitude. 
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Fig. 12. — Moderate correlation between the J-band second rise 
rate and the J/-band peak luminosity. The grey ellipses contain 
95% and 68% of the bivariate population probability distribution 
using the modal values of the population covariance. The inset 
shows the marginal posterior probability density of the correlation 
coefficient obtained via MCMC, along with the mode and proba- 
bility of positive correlation with the absolute magnitude. 

marginalizing over 0" and the posterior uncertainty of 

the hyperparameters fi^ , for various partitions of (ps ■ 
We find that the statistical model implies the following 
properties: (1) If one only observes the light curve around 
Tsmaxi then the single most valuable measurement is the 
7J-band apparent magnitude, providing a distance mod- 
ulus precision of ~ 0.14 mag; the J- and if^-bands do not 
add much more statistical power. (2) If one monitors the 
J-band light curve at late times to measure the second 
rise to the secondary maximum, the moderate correlation 
of the rise rate with absolute magnitudes can be used to 
decrease the uncertainty towards ~ 0.1 mag. In the next 
section, we will test the predictive performance of the 
model using the NIR SN la sample. 
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Fig. 13. — Evidence for correlation between the J-band second 
rise rate and the iCs -band peak luminosity. The grey ellipses con- 
tain 95% and 68% of the bivariate population probability distri- 
bution using the modal values of the population covariance. The 
inset shows the marginal posterior probability density of the cor- 
relation coefficient obtained via MCMC, along with the mode and 
probability of positive correlation with the absolute magnitude. 

4.4. The Hubble Diagram of JHKg SN la Light Curves 
4.4.1. Hubble Residuals and Training Error 

We list in Table 4 the redshifts and several estimates 
of the distance moduli to the SN la in our training 
set. We list the redshift-based LCDM Hubble flow dis- 
tance and its uncertainty, described by Eq. [8l on the 
Ho ~ 72 km s^^ Mpc~^ scale. This describes the factor 
P{^s\zs), conditioning on the redshift only, and incorpo- 
rating the assumed CTpcc = 150 km s~ peculiar velocity 
uncertainty and the redshift measurement error. As a 
product of the Bayesian treatment, we obtain the pos- 
terior estimate of the distance modulus, combining the 
redshift information with the statistical light curve model 
and conditioning on the entire dataset to generate an 
"information update." This is expressed as the marginal 
posterior probability P(^s|I?, 2^). The mean and stan- 
dard deviation of this probability density are listed as 
/ipost and CTpost for each supernova. 

The typical measure of the quality of a model for SN 
la standard candles is the average residual in the Hubble 
diagram. First, the redshift-based distance moduli and 
photometric light curves are used to "train" a statisti- 
cal model, by determining the mean absolute magnitude 
and variance, and perhaps relationships between absolute 
magnitude and light curve shape. Once these parameters 
of the statistical model are determined from the training 
set data, the photometric light curves are fed into the 
model without the redshift-based distances to "predict" 
standard candle distances using the model. These new 
distances are compared to the Hubble flow distances to 
calculate the average residual error. 

This measure of "training error" may be called the re- 
substitution error because it involves using the redshifts 
and light curve data of the training set to train the model 
parameters, and then the resubstituting the light curve 
data back into the model to produce distance "predic- 
tions" as if the light curve data were new. 

In our fully Bayesian formulation, the process of "train- 
ing" corresponds to computing the (non-gaussian) pos- 
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TABLE 4 

SN lA NIR Distance Modulus Estimates 



SN 


cz ^ 




t^LCDM\z 




)3 

P-post 


(^p ost 


Mresub 


•^resub 


Mpred 


^pred 


'-''prcd 




[km s^-*-] 


[km s^-^] 


[mag] 


[mag] 


[mag] 


[mag] 


[mag] 


[mag] 


[mag] 


[mag] 


[mag] 




709 


20 


30.00 


0.46 


29.84 





10 


29.85 


0.09 


29.84 


0.09 


0.12 




957 


86 


30.62 


0.39 


30.95 





10 


30.99 


0.08 


31.00 


0.12 


0.14 


aiN iy9ycp 


2909 


14 


33.05 


0.11 


32.98 





08 


32.94 


0.14 


32.94 


0.04 


0.15 


SN 1999ee 


3296 


15 


33.32 


0.10 


33.23 





06 


33.20 


0.05 


33.16 


0.07 


0.09 


biN lyyyek 


5191 


10 


34.32 


0.06 


34.36 





06 


34.45 


0.14 


34.47 


0.06 


0.15 


OiN 1999gp 


8113 


18 


35.30 


0.04 


35.28 





04 


35.10 


0.20 


35.05 


0.06 


0.20 


blNzUUUli; 


1803 


19 


32.00 


0.18 


31.89 





06 


31.89 


0.07 


31.86 


0.09 


0.11 


Q IVT o n n n V, 1-, 
olN ZDUUbn 


6765 


21 


34.90 


0.05 


34.91 





04 


34.90 


0.04 


34.88 


0.04 


0.06 


biNzUUUbk 


7976 


20 


35.27 


0.04 


35.27 





04 


35.34 


0.08 


35.57 


0.13 


0.15 


blNzUUUca 


6989 


62 


34.97 


0.05 


34.92 





05 


34.78 


0.10 


34.75 


0.05 


0.11 


blN zUUUce 


5097 


15 


34.28 


0.06 


34.28 





06 


34.28 


0.12 


34.24 


0.09 


0.14 


bINzUUiba 


8718 


22 


35.46 


0.04 


35.48 





03 


35.53 


0.08 


35.51 


0.06 


0.10 


blNzUUibt 


4220 


13 


33.87 


0.08 


33.84 





05 


33.83 


0.05 


33.82 


0.06 


0.08 


blNzUUicn 


4454 


250 


33.97 


0.14 


33.84 





05 


33.83 


0.06 


33.86 


0.05 


0.08 


bINzUUiCZ 


4506 


20 


34.00 


0.07 


33.94 





06 


33.88 


0.12 


33.85 


0.04 


0.12 


blNZUUiel 


y / o 




oU. / U 


U.oo 


Q 1 nc 





07 


Q 1 in 
oi. iU 


U.Uo 


oi.l i 


U.Uo 


n no 
U.Uo 


blN ZUUzbo 


1696 


20 


31.88 


0.19 


32.19 





06 


32.19 


0.05 


32.21 


0.11 


0.12 


bINzUUzaj 


2880 


22 


33.03 


0.11 


32.95 





05 


32.94 


0.05 


32.92 


0.07 


0.09 


blNzUUoCg 


1340 


24 


31.37 


0.25 


31.92 





07 


31.97 


0.06 


31.97 


0.18 


0.19 


biNzUUoau 


2206 


14 


32.44 


0.15 


32.58 





10 


32.61 


0.17 


32.60 


0.08 


0.19 


biN2UU4b 


2607 


16 


32.81 


0.13 


32.96 





08 


33.01 


0.06 


33.07 


0.12 


0.14 


biNzUU4eo 


4859 


17 


34.17 


0.07 


34.05 





05 


33.93 


0.10 


33.92 


0.05 


0.11 


blNzUUoao 


11828 


126 


36.14 


0.04 


36.15 





04 


36.20 


0.10 


36.40 


0.20 


0.23 


bJNzUUoci 


2018 


11 


32.24 


0.16 


32.14 





08 


32.13 


0.06 


32.10 


0.09 


0.11 


blN ZUUocn 


8094 


1499 


35.30 


0.40 


35.26 





10 


35.27 


0.11 


35.25 


0.06 


0.12 


oiNiUUoel 


4349 


8 


33.93 


0.08 


33.91 





05 


33.90 


0.03 


33.85 


0.08 


0.08 


biNZUUoeq 


8535 


25 


35.41 


0.04 


35.40 





04 


35.39 


0.05 


35.26 


0.15 


0.16 


(JIN ^UVJ'JIK^ 


10102 


40 


35.79 


0.03 


35.79 





03 


35.86 


0.17 


35.81 


0.06 


0.18 


SN2005na 


7826 


26 


35.23 


0.04 


35.22 





04 


35.21 


0.14 


35.16 


0.10 


0.17 


SN2006D 


2560 


18 


32.76 


0.13 


32.73 





06 


32.72 


0.06 


32.76 


0.12 


0.14 


SN2006N 


4468 


27 


33.99 


0.07 


33.97 





06 


33.98 


0.15 


33.95 


0.06 


0.16 


SN2006X 


1091 


20 


30.88 


0.30 


31.10 





07 


31.12 


0.03 


31.12 


0.09 


0.10 


SN2006ac 


7123 


17 


35.01 


0.05 


35.01 





05 


34.98 


0.14 


34.99 


0.09 


0.17 


SN2006ax 


4955 


20 


34.21 


0.07 


34.26 





06 


34.31 


0.09 


34.38 


0.06 


0.11 


SN2006cp 


6816 


14 


34.92 


0.05 


34.92 





05 


34.92 


0.14 


34.90 


0.04 


0.14 


SN2006gr 


10547 


22 


35.89 


0.03 


35.90 





03 


36.00 


0.19 


36.02 


0.09 


0.21 


SN20061e 


5403 


12 


34.40 


0.06 


34.50 





06 


34.63 


0.08 


34.69 


0.04 


0.09 


SN20061f 


4048 


10 


33.77 


0.08 


33.80 





07 


33.83 


0.11 


33.85 


0.17 


0.20 


SN2007cq 


7501 


50 


35.13 


0.05 


35.11 





05 


35.01 


0.19 


34.91 


0.03 


0.19 



Corrected to the CMB+Virgo frame. 
^ Z^post and (Tpost are the mean and standard deviation of the trained posterior density in the distance modulus, /^icsub 
and CTresub ^-^e the mean and standard deviation of the resubstituted predictive posterior of the distance modulus, ^p^cd 
and Spred the average and scatter over multiple bootstrapped training sets of the expected predictive distance moduli, 
^prcd quadrature sum of predictive uncertainty and the scatter over bootstrap predictions. 



terior density over the hyperparameters P{^^, 'S^l V, Z) 
obtained by integrating over Eq. [TH] This is to be con- 
trasted with simpler approaches that merely find point 
estimates of the model parameters. The process of pre- 
diction uses this posterior probability together with new 
light curve data Vg to compute the predictive posterior 
P{fls\ T^s^T^i Z). The marginalization over the hyperpa- 
rameters correctly incorporates the uncertainties in the 
means, variances, and correlations of the absolute magni- 
tudes and light curve shape parameters. Recall that the 
training set is V — {Vg} and Z = {Zg}. The resubsti- 
tution distance estimate is obtained by setting Vg — Vg 
and computing the predictive probability P(/is \T^s,T^i Z) 
for each supernova s in the training set. 

The uncertainty-weighted mean resubstitution error is 
then a sum over all resubstituted predicted distances for 
each supernova. 



err: 



rcsub 



E 

s=l 



Wg X 



Mrcsub 



s=l 



(31) 



the vari- 

„2 



rcsub, s ■ 



and 



un- 



where the expected predictive distance is 
ance is o-^csub s ^^"^ "^^^ weights are = a 
The resubstitution predictive distance moduli 
certainties are listed in Table 4. Figure [U shows the 
Hubble diagram constructed from the resubstitution dis- 
tance moduli from our JHKg statistical model. We com- 
pute the training resubstitution error over the training 



set SN with recession velocities 
find err rcsub = 0.10 mag. 



> 2000 km s"\ and 



4.4.2. Cross-Validation and Prediction Error 

The resubstitution error is an optimistic estimate of 
the predictive, or generalization, error arising from pre- 
dicting the distances of new SN la light curves that were 
not in the training set ( "out of sample" ) . The resubsti- 
tution prediction P{pLg \ "Dg, P, Z) conditions on the data 
'Dg twice: once during training when it is included in 
the training set 2?, Z and once in resubstitution to assess 
training error. Double use of the data for both training 
and evaluation is likely to lead to optimistic measures of 
predictive performance, underestimating the true predic- 
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Fig. 14. — Hubble diagram constructed by resubstitution of train- 
ing set NIR SN la light curves into the trained statistical model. 
The dotted lines indicate the uncertainty in distance modulus due 
to peculiar velocities. The average residual at cz > 2000 km s~^ 
is an excellent 0.10 mag. The three open circles are the SN with 
Ay > 2 as measured from the optical light curves with MLCS2k2. 



tive error. It is always possible to reduce the residuals of 
a fit to a finite, noisy sample by introducing arbitrarily 
more complex relations, but arbitrarily complex models 
will typically not generalize well to new data. We should 
compute the generalization error for out-of-sample cases 
to assess predictive performance of the statistical model. 
The distinction between the training error, or Hubble 
residual, and the expected prediction error has not been 
fully addressed in the literature on SN la light curve in- 
ference methods. In this section, we describe the novel 
application of a cross-validation procedure to assess the 
prediction error and to test sensitivity of the statistical 
model to the training set SN. 

To estimate the out-of-sample prediction error and 
to avoid using the light curve data twice for training 
and evaluation, we performed bootstra p cross-validation 
(lEfronl 119831: lEfron fc Tibshiranil [19971) . We sample SN 
with replacement from the original training set to sim- 
ulate the generation of alternative training sets of the 
same size. Because of the random resampling, each boot- 
strapped training set will typically contain duplicate SN 
data sets and will be missing others. Each bootstrapped 
training set of size n SN will be missing approximately 
(1 — 1/n)" « 37% of the SN in the original training set. 
The SN missing from the bootstrap training set form a 
prediction set, on which we assess the predictive error of 
a model trained on the complementary training set. Let 
P^, be a training set bootstrapped from the original 
T>^Z. Then the prediction set is I? \ T)^ . To train the 
statistical model we compute P(/x^, P^, Z^) as in 
Eq. [m For each supernova light curve 2?s £ \T> \ V^}, 
we compute the predictive density P(/is | P^, 2''^, 2^). 
This random process is repeated so that each supernova 
distance is predicted several times from different boot- 
strapped training sets. This process avoids using each 
SN light curve simultaneously for both prediction and 
training. 

We repeated this process fifty times for the original 



training set in Table 2. On average, each supernova is 
held out of the training set and its distance modulus fj,s is 
predicted about eighteen times. For each supernova, the 
average over all of its predictions /Xprcd and the standard 
deviation Sprod over all predictions are listed in Table 4. 
We also list the sum of the variance over predictions s^^.^^ 
and the average uncertainty of a prediction (the variance 
of P{fis\Vs,V^ , Z^)) as CTpi-od- Often the uncertainty 
of a single prediction is larger than the scatter of the 
predictions from different bootstrapped training sets, al- 
though this is not always true. In Fig. [15] we show the 
Hubble diagram of mean predicted distance moduli /^prcd 
and their total scatter cTpicd- Because we do not use the 
data twice for training and prediction, the scatter about 
the Hubble line is less tight than in the resubstitution 
Hubble diagram, Fig. [T4| 

The "leave-one-out" bootstrap error is computed as an 
uncertainty-weighted average of squared prediction er- 
rors. 



2^B=l 2^s£{V\V'^} 



Err?i 



X 



A*prcd,i3 -HPs\Zs) 



(1) 



where fi: 



(32) 



prcd,_B 



E(/is jPs, 2?^, 2^) and the weights are 
Var [/Zs I , 2?^ , ] . This bootstrap 



error estim ate is known to be upwar dly biased. 'Efron| 
(|1983[ ) and lEfron fc Tibshiranil (|1997D have shown that 
a better estimate of prediction error is obtained by aver- 
aging the bootstrap error with the resubstitution error, 
using the ".632 bootstrap estimator": 



Err' 



632 



0.632 X Err(i) + 0.368 x err^^^^^ 



(33) 



For the Hubble flow SN {cz > 2000 km s~^) in our 
sample, we compute this estimate of prediction error: 
Err.632 = 0.15 mag. This is a larger error than the resub- 
stitution error computed above, as expected. However, 
it is a more realistic estimate of predictive performance 
of distance estimation with our JHKg light curve model 
and the current SN sample. This result confirms that 
NIR SN la are excellent standard candles. 

This process of resampling of alternative training set 
tests how sensitive the predictions are to the composi- 
tion of the finite training set. If the statistical model is 
reasonable, and we had an infinite training set, we would 
expect the original training set to be representative of the 
population of NIR SN la and we would expect the resam- 
pled training sets (and also the complementary held out 
sets) to look like the original, and also be representative 
of the population. We would expect that the prediction 
error and the resubstitution error to be almost the same. 
Our actual training set is finite, so the resampled sets 
will not look exactly like the original set. This proce- 
dure tests the sensitivity to the finite sample in addition 
to making predictions without double use of the light 
curve data. 

The gap between the estimated prediction error (0.15 
mag) and the resubstitution error (0.10 mag) tells us that 
the trained statistical model is sensitive to the finite sam- 
pling of the training set. A larger training set we would 
be more robust to resampling and we expect that the fu- 
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Velocity [CMB+Virgo] (km/s) 

Fig. 15. — Hubble diagram constructed using predicted distances 
of NIR SN la light curves obtained by inferring the statistical model 
from 50 bootstrapped training sets. The error bars include both 
the predictive posterior uncertainty and the scatter over multiple 
bootstrapped predictions. The estimate of the prediction error for 
cz > 2000 km s~^ is an excellent 0.15 mag. The three open circles 
are the SN with Ay > 2 as measured from the optical light curves 
with MLCS2k2. 

ture predictive uncertainty will be in between the current 
resubstitution error and the estimated predictive error. 
A naive argument would suggest that if this gap of 0.05 
mag between the prediction error and the training error 
decreases with the square root of the number of SN la in 
the sample, then a set of ~ 200 SN la would reduce it to 
about 0.02 mag, and a few hundred would be needed to 
reduce it to ^ 0.01 mag. To build up statistical strength 
and further solidify our knowledge of the properties of 
SN la in the NIR, we are continuing our campaign to 
observe SN la in the near infrared with PAIRITEL. 

4.5. The Effect of Dust on the SN la NIR Sample 

The results presented thus far have ignored the effects 
of dust extinction in the NIR sample. We can exam- 
ine the possibility of extracting information about the 
dust distribution from the NIR by looking at the colors 
Jo — Hq, Jo — Kso, and Hq — Kgo at peak. From the pop- 
ulation hyperparameters we can compute the mean and 
standard deviation of these colors. The mean colors are 
-0.25, 0.0 and 0.25, and their population dispersions are 
0.14, 0.20, 0.17, respectively. This means that, for optical 
extinctions that are less than Ay 2 — 3, dust extinction 
in the near infrared cannot be clearly distinguished from 
intrinsic color variations, because of the diminished ef- 
fects of dust absorption in the NIR. Only SN with much 
greater NIR reddening carry information on the dust dis- 
tribution from their NIR data alone. Estimates of the op- 
tical Ay extinctions of the SN in our samp le from MLCS 
analysis of optical light curves (jHicken et a l. 2009a) were 
reported in WV08. There are only three highly reddened 
SN with Ay > 2: SN 1999cl, 2006X, and SN 2003cg 
(Ay = 3.49,3.83,4.20, respectively) in the sample of 39 
SN, and they are depicted in Figs. [TH [15] with open 
circles. Although they are redder in their NIR colors 
than the population mean, their colors are only about 
~ 1 — 2ct redder, so they are barely distinguishable from 



the intrinsic color variations. This conclusion does not 
change if we calculate the mean and standard deviation 
of the colors by including or excluding the highly opti- 
cally reddened SN. 

If we take the Ay estimates from the optical data at 
face value, we can estimate the likely effect of dust extinc- 
tion on our posterior estimates of absolute magnitudes. 
The relative weight of a particular supernova in posterior 
inferences about absolute magnitude- related quantities 
(means, variances and correlations) is inversely propor- 
tional to its magnitude uncertainty due to peculiar ve- 
locities: = c/cr^g and — J^s'^i^^s- Comparing 
these weights to the Ay for each SN, we find that 87% 
of the magnitude weight lies with SN with Ay < 0.5, 
97% of the weight is in Ay < 1, and 99.7% of the weight 
is in Ay < 2. The three SN with Ay > 2 have a to- 
tal weight of 0.32% in magnitude calculations. Although 
the highly reddened supernovae have large Hubble resid- 
uals in Figs. [Ml [151 since they are at low redshifts where 
the contribution of peculiar velocity uncertainties to their 
distance uncertainty is large, they have little influence on 
the posterior inferences about the NIR absolute magni- 
tudes. Furthermore, they have no effect on the estimates 
of the training error or prediction error, because only 
the Hubble flow SN at cz > 2000 km s"^ are useful for 
validation of the statistical model. 

The weighted mean Ay value of the sample is 
J2s ^%i^v — 0-23 mag. Assuming a CCM law with 
Ry — 2, this means that the estimated NIR absolute 
magnitudes would be impacted by mean extinctions of 
about Aj = 0.05, Ah = 0.03 and Ak = 0.02. The 
weighted scatters in the NIR extinctions implied by the 
Ay values are about cr{Aj) = 0.08, cr(A//) = 0.05, 
<j{Ak) = 0.03. If these are the dust contributions to the 
measured dispersions cr{Mx), then subtracting them in 
quadrature yields intrinsic dispersions of a{Mj) = 0.15, 
o-(M^) = 0.10, and cj{Mks) = 0.19. These rough esti- 
mates do not substantially change our results. 

We conclude that the NIR sample alone contains little 
if any information about the dust distribution and hence 
it is not worthwhile to use the full model described in §2 
at this time to infer the dust properties. Additionally, if 
we extrapolate the Ay estimates obtained from the opti- 
cal data to the near infrared, the estimated effect of dust 
extinction is fairly small. These rough estimates do not 
take into account the non-gaussianity of the dust distri- 
bution and a full Bayesian analysis of the directed graph 
in Fig. [H conditioned on both the NIR and optical data 
simultaneously will be required to obtain informative in- 
ferences about the dust properties (Mandel et al. 2009, 
in prep.). 

5. CONCLUSION 

We have constructed the hierarchical Bayesian formu- 
lation of statistical inference with SN la light curves, 
and represented the probabilistic structure using for- 
mal graphical models. Furthermore, we have presented 
a Markov Chain Monte Carlo algorithm that uses the 
conditional independence structure of the equivalent di- 
rected acyclic graph to efficiently sample the global pos- 
terior probability distribution over individual light curve 
parameters and population hyperparameters for train- 
ing the statistical model on the low-z data set, and for 
prediction on future SN la data. We have applied this 
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approach and computational method to the JHKg Ught 
curve data set compiled by WV08, including a recent ho- 
mogeneous set of light curves from PAIRITEL, and com- 
puted the joint posterior probabilities over all individual 
light curve parameters (Table 2) and the statistical char- 
acteristics of the population, including the covariance of 
absolute magnitudes and J-band light curve shape pa- 
rameters (Table 3). 

We summarize the assumptions of our statistical 
model. First, we have assumed that the normalized H- 
and Kg- band light curves of different SN are identical be- 
tween -10 and 20 days around maximum. Furthermore, 
we have posited a parametric light curve model for the 
J-band between -10 and 60 days that captures the vari- 
ations in the double-peaked structure. A quick look at 
the data and the template models we have constructed in 
Fig. [3] and Fig. [5]reveals that these are reasonable mod- 
els for the JHKs data. The major assumption in the 
application of our hierarchical model is that the param- 
eters governing the multi-band absolute light curves are 
drawn from a jointly multivariate Gaussian population 
distribution. This is the simplest multivariate distribu- 
tion that models correlations, and its use is reasonable 
in the absence of other guiding information. Our results 
(Figs. [51 fT5)) reveal no obvious deviations from this as- 
sumption, but this is certainly not proof, especially with 
a small sample. This assumption must be constantly re- 
evaluated in applications of the hierarchical framework 
to larger or different data sets or with other light curve 
models. In this paper, we have not estimated the dust- 
related aspects of Fig. [U because the effects of dust are 
small for our NIR sample. However, in future studies in 
conjunction with optical data, the full graph can be com- 
puted using BayeSN to perform probabilistic inference 
of the SN la population and dust distribution. 

The marginal intrinsic scatter in peak absolute magni- 
tudes were found to be a{Mj) = 0.17 ± 0.03, a{MH) = 
0.11±0.03, and (T(Mif3) = 0.19±0.04. We have presented 
the first quantitative measurements of the correlations of 
NIR absolute magnitudes with J-band light curve shape. 
We showed that with greater than 95% probability there 
are positive correlations between peak JHKs absolute 
magnitudes and the J-band post-trough rise rate. In- 
trinsically dimmer SN la light curves tend to rise to the 
second J-band maximum faster. Since in our J-band 
model, the post-second-peak decline rate is linked to rise 
rate, this also suggests that the late- time slopes of J- 
band light curves are steeper for dimmer SN. We have 
also quantitatively measured correlations of the rise rate 
with other aspects of the light curve shape (Table 3), 
which show that faster decline rates go with faster rise 
rates, shorter times to trough and shorter times to the 
second maximum. These results suggest that NIR SN la 
are excellent standard candles at peak, and they can be 
improved by using the information in the late-time light 
curve. 

These relations may be useful for better understanding 
of SN la progenitor explosions in conjunct ion with phys - 
ical modeling. The theoretical models of iKasenI (|2006l ) 
suggest that the structure of the secondary maximum in 
the NIR is related to the ionization evolution of the iron 



group elements in the SN atmosphere. They also indi- 
cate that NIR peak absolute magnitudes have relatively 
weak sensitivity to the input progenitor ^^Ni mass, with 
a dispersion of ^ 0.2 mag in J and K, and ~ 0.1 mag 
in H over models ranging from 0.4 to 0.9 solar masses of 
^®Ni. The optical and bolometric peak magnitudes have 
much larger variations over the same range of mass. Fur- 
ther observational studies of SN la in the NIR may place 
valuable constraints on theoretical explosion models. 

We constructed a Hubble diagram with the training 
set SN la, and found an average residual of 0.10 mag for 
cz > 2000 km s^^. We have also performed bootstrap 
cross-validation to estimate the out-of-sample prediction 
error, which was found to be an excellent 0.15 mag. The 
gap between these estimates suggests that a larger sam- 
ple is needed to solidify our inferences about the pop- 
ulation of near-infrared SN la light curves. Our group 
continues to collect an extensive set of nearby NIR SN 
la light curves using PAIRITEL. With an ever-growing 
data set, in the near future, we will be able to expand 
the model considered here to include more extensive light 
curve models in H and K^^ and to combine the NIR and 
optical data to gain a better understanding of SN colors 
and dust extinction (Mandel et al. 2009, in prep.). 

It is worth considering whether the propitious prop- 
erties of SN la in the NIR can be leveraged by future 
space missions for SN la cosmology. The measurement 
of dark energy properties by the NASA/DOE Joint Dark 
Energy Mission will be limited by systematic effects, in 
particular dust extinction. The diminished absorption 
by dust and the narrow dispersion of peak luminosities 
in the NIR, particularly in the Jf-band, may be crucial to 
the precise measurement of dark energy, if observations 
of high-z SN can be conducted in the rest-frame NIR. 
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APPENDIX 

A. CONDITIONAL INDEPENDENCE AND D-SEPARATION 

Let P{{6i}) be a joint distribution of random variables represented by a directed acyclic graph. Consider three 
disjoint subsets of the random variables {9i} : A,B and C. Two sets are marginally independent if P{A,B) — 
P{A)P{B). Two sets, A and B, are conditionally independent given a third set C if P{A,B\C) = P{A\C)P{B\C). 
Marginal independence between A and B can be seen in a directed graph because there will be no links between the 
nodes in set A and the nodes in set B. Conditional independence indicates that if the values of the nodes in C are 
known, then the variables in A and those in B are statistically independent. Graphically, this means that all directed 
or undirected paths (ignoring the arrows) from one set to the other are "blocked" by nodes in C. 

Conditional independenc e between tw o sets of nodes given a third set can be ascertained from a directed graph using 
the d-separation property ()Pear]||1988f ): A path between a node in A and a node in B is blocked at node 9i if (1) the 
intermediate node 9i is in set C and the arrows meet at 0i in a tail-to-tail or head-to-tail fashion (not convergent), or 
(2) the arrows meet head-to-head (convergent) and the intermediate node 9i is not in C, and neither are any of its 
descendants. The nodes A are d-separated from the nodes B given set C if all paths between elements in A and B are 
blocked. If the nodes A are d-separated from the nodes B by C, then A is conditionally independent from B given C. 

B. THE BAYESN ALGORITHM - MATHEMATICAL APPENDIX 

In this appendix, we present mathematical details of the BayeSN algorithm. Let xps ° , tpj^'^ , and 
indicate all the intrinsic parameters in ips other than the peak absolute magnitude, the linear shape parameters, and 
the nonlinear shape parameters in filter respectively. 

1. We have used the conjugate hyperprior density P(/x^,S^) defined in Eqns. [T^ and [T51 and choose the non- 
informative limit by setting kq — O^vq = —1, and Aq — sqI for small eq- Let xp be the sample mean of the 
{■0s}, and let = (V's — '0)(0s — 0)^ be the matrix sum of squared deviations from the mean. The con- 
ditional posterior density P{fi^,H^\ {ips}) can be decomposed as {ips} ~ Inv-WishartAr_i ([Aq -|- S^]^^) and 

{ips} ^ N{xp, S^/A^sn) ()Gelman et al.ll2003[ ). We first directly sample a new covariance matrix from the 
inverse Wishart distribution The matrix drawn in this way is guaranteed to be a proper covariance matrix (i.e. positive 
semi-definite). Conditional on that matrix we directly sample a new population mean fi^ from the multivariate normal 
distribution. 

2. Let A be the sample mean of the {^|^}. The conditional posterior density is P{ta \ {^|r}) ~ Inv- Gamma(A'gN — 

l,iVsNA). 

3a. Let A^-^ = l^W^l and Fa = Nl^ {Wf)-^[mf - i^(6>^L. J - -Z^f (©nl, J^l,^]- We can compute the 
population conditional expectation: = A*s + Af + E[Af(fg|'0s and the population conditional vari- 

ance C — Var [M(^^ I 05 ,/.t^,S^], using the conditioning property of the multivariate Gaussian distribution. 
Then the conditional density of Fq^ is normal N{Fqs\Fq,A) with variance A — {N^^ + C^^)^^ and mean 
Fo=A{N-^Fo + C~^F„). 

3b. Compute N'^ = if ^(0^^, J(W^/)~'^f (^nl, J and §[ = __NL^'^ {e^^J{Wn''[mf ~ IFp,, - 
-^o'(^NLs)]- The conditional population expectation and variance are: 0^ — £[0^ ^| ■j/'J^'^, /.i^, S^] and C — 
Var[0L ^1 /i^,, S^,]. The conditional posterior density of ^ is normal N{Q^ ^Q^^K) with covariance ma- 
trix A = (AT^i + C^^Y^ and mean df^ = A{N^'^9[ + C^^9[). Note that steps 3a and 3b could be combined by 
Gibbs sampling from P(Fo^s,0f J cpj^'^ , fig, As; fi^,'E^,TA,'Ds, Zg). 

3c. Compute the expectation and covariance of the conditional population density: 6^^ — IE[0nl s \ fp7^^'^ ^ f^ipj '^ip] 
and C = Var[0^Ll ■07^'"'^, /x^, S^]. The conditional posterior density of the nonlinear parameters in band F, 6^j^ 
is proportional to A^(mf | lFo,s + (^nl.J + -^f (^nl.J^l ^ ) x ^(^nl,sI ^nl> C). We obtain a proposal 0^1% ^ 
^(^NL,s' ^jump s)' and apply the Metropolis rejection rule. 

3d. We allow for the possibility that the probability density of the distance modulus conditioned on the redshift 
only, P(/^s|zs), may be mildly non-Gaussian. The conditional posterior density is P(/Xs|</)s, /x^, S^, 2?s, Zs) oc 
N{(j)a — As — vfis \ fJ'ip,'^-^) X P(fis\zs), and generally cannot be sampled directly. However, we can approximate 
P{lJ,s\zs) ~ N{fis\fig = f{zs), cr^J with a Gaussian using Eq. [H We choose the MetropoHs-Hasting proposal distribution 
to be g(/z*|0^ As,M^,S^) oc N{4>s -As- v^H fi^,!:^) x N{fi*s\ng,al) = N{fi*s\ il,al), where af, = {a'"^ + s-^)-\ 
and fi — <5'~^(cr^^/ig + s^'^p) is a weighted average of the distance information from the redshift and the light curves. 
The mean jl = s^t;-^S^^(0s — /x^ — As) and variance = {v'^'Ei'^^v)~^ describe the distance information from the 
individual SN light curves only. We draw a proposed /i* from Q. The Metropolis-Hastings ratio is computed from Eq. 
[2T] using the above conditional posterior density and the proposal density. After cancellation of terms, this simplifies 
to: r — P{fj,l\zs)N{fj,s\ fJ.g,af^)/P{fj.s\zs)N{fis\ fJ-g,crf^). If P{fJ.s\zs) is actually close to Gaussian, Eq. [3 then the M-H 
ratio is identically one, and the proposal /x* is always accepted, as this is the same as Gibbs sampling. If P{i2s\zs) is 
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mildly non-Gaussian, then r and the acceptance rate will be slightly less than one. 
3dP. For distance prediction, the distance modulus /i^ is Gibbs sampled from the conditional posterior density 

3e. Sample a proposed extinction A^* ^ Af(yl|^, cr^^j^p ^ g). The conditional posterior density of the extinction A^* 
is proportional to N{(f)s — v^ig — As(A^*)|/x,0, S^) x Expon(A^*| ta)- Apply Metropolis rejection. 

C. BAYESN - PRACTICAL CONSIDERATIONS 

The chain is seeded with initial starting estimates for all the parameters. It is useful before running the MCMC 
to obtain rough point estimates of the light curve parameters using, e.g. the maximum likelihood estimate (MLE). 
Point estimates of the distance moduli can be obtained from ^{ns \ Zs)- The extinction values A^^j can be chosen to 
be small random numbers. Random noise is added to these point estimates to generate different starting positions of 
each chain, to ensure that each chain begins in a different region. 

The Metropolis steps within the Gibbs scan use jumping kernels that must be tuned to generate efficient MCMC 
chains. The scalar kernels u-^ump.A.s are tuned to generate ~ 40% acceptance rates for their respective Metropolis steps. 
This is easily done by running a few preliminary short chains to compute the average acceptance rates and adjusting 
the jumping sizes accordingly. The nonlinear jumping kernel, Sj^^p ^ is a matrix if there are more than one nonlinear 
parameters in the light curve model for band F . This can be estimated from the inverse of the Fisher information 
matrix at the MLE estimate, or from the sample covariance of the ^ chain values, to reflect the shape of the 
underlying density. The overall size of the matrix is then scale d to produ ce acceptance rates of 40% in preliminary 
short runs, or ~ 23% if the dimensionality of ^ is high ( Ge lman et a l. 2003). Once the jumping kernels have been 
set to appropriate values, long chains are run. 

To assess the convergence of the MCMC, a few independent long chains with different initial positions are run. The 
BayeSN computation is easily be parallelized as each independent MCMC chain can be run on a separate processor. 
The Gelman- Rubin statistic (jGelman fc RubinI 11993 ) compares the between-chain variances with the within-chain 
variances in each parameter to compare the coverages of the chains . If the chains have converged, the Gelman-Rubin 
ratio should be close to 1. The sample paths of representative parameters are inspected visually to ascertain that the 
chains are well mixed. Upon convergence, the initial portions of each chain are discarded as "burn-in" , and the chains 
are concatenated for final inferences. 
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